Spatiotemporal Prediction of Ground Surface Deformation Using TPE-Optimized Deep Learning

Maoqi Liu; Sichun Long; Tao Li; Wandi Wang; Jianan Li

doi:10.3390/rs18020234

Highlights

What are the main findings?

This paper establishes a unified workflow for spatiotemporal deformation prediction and susceptibility mapping using InSAR data. It systematically compares six hybrid architectures and replaces manual parameter tuning with a TPE Bayesian optimization. TPE consistently outperforms manual tuning in terms of RMSE, MAE, and R², with statistical validation confirming reproducible differences. The ResNet + Transformer architecture demonstrates the best performance. Based on the Top 10 ensemble results, the probabilistic susceptibility and uncertainty products generated using quantile thresholds and hyper-threshold probabilities, as well as the residuals and high uncertainty, exhibit structural clustering near fault and mining boundaries.

What is the implication of the main finding?

Results demonstrate that automated hyperparameter optimization significantly reduces human selection bias and provides a fairer performance benchmark for different architectures, thereby enhancing model usability in engineering scenarios. Integrating probability and uncertainty mapping transforms model outputs into actionable risk information, supporting monitoring prioritization for encryption and field verification. The consistent distribution of high uncertainty and high residuals within geological discontinuities indicates the model’s capability to identify physically complex zones and flag potential failure locations, thereby mitigating misjudgment risks in critical areas.

Abstract

Surface deformation induced by the extraction of natural resources constitutes a non-stationary spatiotemporal process. Modeling surface deformation time series obtained through Interferometric Synthetic Aperture Radar (InSAR) technology using deep learning methods is crucial for disaster prevention and mitigation. However, the complexity of model hyperparameter configuration and the lack of interpretability in the resulting predictions constrain its engineering applications. To enhance the reliability of model outputs and their decision-making value for engineering applications, this study presents a workflow that combines a Tree-structured Parzen Estimator (TPE)-based Bayesian optimization approach with ensemble inference. Using the Rhineland coalfield in Germany as a case study, we systematically evaluated six deep learning architectures in conjunction with various spatiotemporal coding strategies. Pairwise comparisons were conducted using a Welch t-test to evaluate the performance differences across each architecture under two parameter-tuning approaches. The Benjamini–Hochberg method was applied to control the false discovery rate (FDR) at 0.05 for multiple comparisons. The results indicate that TPE-optimized models demonstrate significantly improved performance compared to their manually tuned counterparts, with the ResNet+Transformer architecture yielding the most favorable outcomes. A comprehensive analysis of the spatial residuals further revealed that TPE optimization not only enhances average accuracy, but also mitigates the model’s prediction bias in fault zones and mineralize areas by improving the spatial distribution structure of errors. Based on this optimal architecture, we combined the ten highest-performing models from the optimization stage to generate a quantile-based susceptibility map, using the ensemble median as the central predictor. Uncertainty was quantified from three complementary perspectives: ensemble spread, class ambiguity, and classification confidence. Our analysis revealed spatial collinearity between physical uncertainty and absolute residuals. This suggests that uncertainty is more closely related to the physical complexity of geological discontinuities and human-disturbed zones, rather than statistical noise. In the analysis of super-threshold probability, the threshold sensitivity exhibited by the mining area reflects the widespread yet moderate impact of mining activities. By contrast, the fault zone continues to exhibit distinct high-probability zones, even under extreme thresholds. It suggests that fault-controlled deformation is more physically intense and poses a greater risk of disaster than mining activities. Finally, we propose an engineering decision strategy that combines uncertainty and residual spatial patterns. This approach transforms statistical diagnostics into actionable, tiered control measures, thereby increasing the practical value of susceptibility mapping in the planning of natural resource extraction.

Keywords:

InSAR; spatiotemporal prediction; deep learning; Bayesian optimization; susceptibility mapping

1. Introduction

Although natural resource extraction is the cornerstone of global economic development, the resulting geological disasters, such as surface deformation, pose an increasingly serious threat to the ecological environment, infrastructure safety, and sustainable social development [1,2]. Large-scale mining, oil and gas extraction, and groundwater abstraction can cause widespread land subsidence and landslides, resulting in significant ecological and environmental damage. These issues have caused significant economic losses in many parts of the world [3,4,5,6]. Thus, assessing the spatiotemporal development of such anthropogenic surface deformations is crucial for optimal early warning systems for hazards and resource management, as well as regional security [7]. Physics-based models, relying on simulation, can provide an understanding of deformations caused by given conditions [8], yet they fail to scale and become rational across complex geological areas [9]. Empirical models fail due to the vast nonlinearity and spatial heterogeneity of deformation mechanics [10]. Recently, however, deep learning has emerged as a powerful time series prediction method and plays a more critical role in geological hazard early warning systems [11,12]. For example, surface deformation in mining areas is heterogeneous on a spatial level, with a neighborhood dependence, and it accumulates temporally with nonlinear expansion over time [13,14,15]. Thus, there is a critical need for a model architecture that simultaneously accounts for spatial interactions and represents temporal developments, and there is a greater need for data-driven approaches that can learn such complex spatiotemporal patterns automatically.

InSAR can reconstruct displacement time series along the line of sight (LOS) from multi-temporal imagery, and it is capable of systematically characterizing the spatiotemporal evolution of deformation with millimeter-order accuracy, all-weather and all-day monitoring capability, and wide-area coverage [16,17]. InSAR has been widely used as an essential tool to monitor subsidence and other cascading hazards in resource exploitation areas [18]. With the rapid development of both sensor technology and processing algorithms, massive operational InSAR observations have been processed into data products and provided by platforms such as the European Ground Motion Service (EGMS). These operational data products provide scientists with long-term spatiotemporal datasets that have millions of measurement points spanning several years [19]. Although these long-term and large-scale datasets provide great opportunities for data-driven research on surface deformation, they also present great challenges to traditional analytical approaches due to the high dimensionality and complexity of the data. Therefore, intelligent algorithms that can process these massive datasets efficiently and deeply are urgently needed.

As an effective method to address these challenges and reshape the research paradigm of Earth science, deep learning (DL) has attracted extensive attention in recent years [7]. Different from traditional machine learning methods, DL models can automatically learn hidden spatiotemporal dependencies from massive remote sensing datasets with powerful nonlinear modeling ability and hierarchical feature extraction capability [20,21]. In terms of spatiotemporal prediction tasks, convolutional neural networks (CNNs) [22] and their variants, such as ResNet [23], have been widely used to model local spatial patterns and multi-scale features of deformation fields. For recurrent neural networks (RNNs), long short-term memory networks (LSTMs) have been widely used in surface deformation research to model the long-term dependence of deformation [24,25]. In addition, more recently, Transformer architectures [26] have also been applied in spatiotemporal prediction tasks, which have a powerful ability to model global spatiotemporal relationships due to their self-attention mechanisms [27]. Furthermore, for the irregular point-cloud data inputs extracted from InSAR, graph convolutional networks (GCNs) provide a powerful and more natural framework to model spatial topological relationships [28,29].

However, despite significant advances in spatiotemporal forecasting, existing research remains constrained by limitations in data representation integrity and prediction reliability. Firstly, most studies processing InSAR data tend to extract only partial feature points to construct discrete datasets [30,31]. This preprocessing disrupts the inherent spatial topology of point clouds, making it difficult to accurately capture the subtle deformation characteristics at the edges of subsidence funnels in mining areas. Secondly, existing ground deformation predictions primarily focus on enhancing prediction accuracy while generally neglecting the quantification of prediction uncertainty [32]. In geological hazard early warnings, single prediction values lacking confidence intervals often conceal potential risks in data-sparse or noisy regions. Although Monte Carlo Dropout (MC Dropout) [33] and Bayesian Neural Networks (BNNs) [34] offer theoretical solutions, their high computational costs limit practical application on large-scale InSAR datasets.

Furthermore, the practical application of deep learning is constrained by the difficulty of model optimization. The outstanding performance of modern deep learning models comes at the cost of extreme sensitivity to hyperparameter settings; for instance, the learning rate, network depth, and regularization strength entirely determine the training process and final performance [35]. However, grid search [36] and random search [37] methods for hyperparameter optimization (HPO) become prohibitively expensive and inefficient when faced with high-dimensional and complex hyperparameter spaces [38,39]. As a result, the model performance in most studies may be far from optimal, and the lack of tuning also hinders the comparability of studies [40]. Developing and applying systematic and automated HPO strategies are key to achieving the full potential of DL models and drawing robust research conclusions [41]. Bayesian optimization (BO) offers an efficient and powerful theoretical strategy to deal with this challenge [42,43]. By constructing a probabilistic surrogate of the objective function (e.g., validation loss) and employing acquisition functions to effectively balance exploration and exploitation, BO can find near-optimal hyperparameter configurations with far fewer evaluations than conventional methods [44,45] and techniques due to BO’s ability to efficiently explore complex and conditional hyperparameter spaces [46,47], and it has been increasingly applied in fields such as geohazard research [48,49].

To address these challenges, we developed a comprehensive spatiotemporal prediction workflow that integrates automated model optimization with uncertainty-quantified risk representation. Using the Rhineland coalfield in Germany as a representative natural resource exploitation area, the main contributions of this study are as follows: (1) We systematically constructed and evaluated six hybrid deep learning architectures for spatiotemporal deformation prediction. (2) We fully integrated a TPE-based Bayesian optimization framework into these architectures and, through rigorous statistical testing, quantitatively demonstrated its significant advantages over manual tuning, thereby providing a consistent reference for comparing model architectures and tuning strategies in mining-induced deformation prediction for the field. (3) By leveraging model ensembles generated during the TPE optimization process, we propose a low-cost uncertainty quantification method that seamlessly bridges technical model optimization with applied risk decision-making. This provides a practical paradigm for transforming advanced machine learning models into reliable and actionable disaster mitigation tools. (4) Moving beyond conventional accuracy assessments, we conducted in-depth residual diagnostics that revealed the intrinsic connections between prediction errors and physical entities (faults and mining areas), as well as their underlying spatial structures.

2. Study Area and Data

2.1. Experiment Area

The geographical location of the study area is indicated in Figure 1a, which illustrates its position within the Rhineland coalfield in North Rhine-Westphalia, Germany. The primary research zone, outlined in the blue rectangle, delineates the scope of investigation within this coalfield. As Europe’s largest opencast coal mine, it accounts for a substantial proportion of Germany’s power generation fuel and holds paramount significance for the nation’s energy economy. Since the mid-1950s, extensive lignite mining operations in this region have altered the landscape, as illustrated in Figure 1b, with severe surface deformation occurring around the mining areas. Interferometric Synthetic Aperture Radar (InSAR) data collected have shown extreme surface deformation in the Cologne Region of the coal field. Three large open-pit mines (Hambach, Garzweiler, and Inden) are located in the three regions that experienced the greatest amount of surface deformation. As indicated in Figure 1b, Hambach and Garzwinelandeiler are active mines, while Inden is inactive. The InSAR data used in this study were collected from EGMS.

Figure 1. Study area overview. (a) Study area. (b) Ground motion information within the region. The blue dashed box in (a) represents the study area, and (b) is a magnified view of the study area, where deformation data were obtained from EGMS. The base diagrams for (a,b) are digital elevation models.

2.2. Data

This study used Level 3 orthogonal products from the European Ground Motion Service, specifically the vertical displacement component, which was obtained from https://egms.land.copernicus.eu/. The dataset provides Interferometric Synthetic Aperture Radar time series spanning 5 January 2016 to 16 December 2021, with a temporal resolution of 6 days and a spatial resolution of 100 m. It was produced using Persistent Scatterer and Distributed Scatterer InSAR processing, and it was calibrated with high-precision Global Navigation Satellite System models. The product is delivered as a vector point cloud. Each coherent point contains a displacement time series and precise geolocation information, with velocity accuracy on the order of millimeters per year. The European Ground Motion Service generates orthogonal components by merging calibrated ascending and descending line-of-sight time series from the Level 2b product and comparing them with Global Navigation Satellite System velocity models. The orthogonal product provides vertical and east–west components; however, the north–south component is not derived from Interferometric Synthetic Aperture Radar [19]. Because our analysis focused on mining-induced subsidence, we only used the vertical component.

3. Methodology

3.1. Construction of the Spatiotemporal Dataset

The InSAR deformation monitoring data provided by EGMS are distributed as discrete high-coherence points rather than the regular grid structures typical of conventional remote sensing imagery. This discrete and irregular spatial distribution poses challenges for the direct application of deep learning models. To enable the use of CNN architectures, we first performed spatial interpolation to transform the data into a regular grid format suitable for CNN input. Specifically, a regular spatial grid was defined based on the coordinate extent of the monitoring points, and the grid size was determined accordingly. For each temporal snapshot, inverse distance weighting (IDW) interpolation was applied to estimate the values for spatially missing pixels, thereby generating continuous, regular two-dimensional grid data.

3.2. Principles of Spatiotemporal Prediction

Surface deformation in mining areas is a typical spatiotemporal evolutionary process, where future states are tightly linked to historical deformation sequences and profoundly influenced by complex spatial factors within the study region. To accurately capture and predict this process, we designed a hybrid deep learning architecture whose core strategy is to decouple spatiotemporal dependencies by employing dedicated modules for spatial representation and temporal dynamics. Such hybrid architectures that combine spatial and temporal modules have proven effective for complex Earth system data.

Given InSAR deformation observations from the past T time steps, at each time step t, the input

X_{t} \in R^{H \times W \times C}

is a two-dimensional multi-channel spatial tensor. The model’s task is to learn a mapping that predicts surface deformation for the next F time steps. Formally, let the historical sequence be

X_{1 : T} = {X_{1}, \dots, X_{T}}

and the F-step forecast be

{\hat{X}}_{T + 1 : T + F} = {{\hat{X}}_{T + 1}, \dots, {\hat{X}}_{T + F}}

. With learnable parameters

Θ

, the prediction objective is as follows:

{\hat{X}}_{T + 1 : T + F} = f (X_{1 : T}; Θ) .

(1)

3.2.1. Spatial Feature Extraction

The goal at this stage is to extract high-dimensional, abstract feature representations from the input two-dimensional spatial data at each individual time step t. These features encode not only the spatial distribution patterns of surface deformation, but also the nonlinear spatial correlations between multiple influencing factors and the deformation itself. When a CNN is used as the spatial feature extractor, the model leverages the local receptive fields and translation invariance of convolutional kernels to efficiently capture local patterns in the deformation field. When a GCN is adopted, the high-coherence points within the study area are abstracted as graph nodes, and edges are constructed according to spatial adjacency, thereby bringing non-Euclidean spatial structure into the modeling. By aggregating information from neighboring nodes to update the central node, GCNs can capture the long-range dependencies induced by geological structures or spatial topology. Regardless of the module choice, this stage is uniformly expressed as follows:

H_{t} = f_{spatial} (X_{t}; θ_{s}),

(2)

where

f_{spatial}

denotes the spatial feature extraction network (CNN or GCN), and

θ_{s}

are its learnable parameters. For CNN-based encoders,

H_{t} \in R^{H^{'} \times W^{'} \times D}

; for GCN-based encoders,

H_{t} \in R^{N_{t} \times D}

, where

N_{t}

is the number of graph nodes (high-coherence points) at time t and edges are defined by k-nearest neighbors or radius-based adjacency. This process transforms the original spatial data sequence into a compact and informative feature sequence

{H_{1}, \dots, H_{T}}

.

3.2.2. Temporal Dependency Modeling

After obtaining the spatial feature sequence at each time step, the temporal module captures the dynamic evolution of these features over time and learns the long-term dependencies of the deformation process. When an LSTM is used, its gating mechanism selectively retains critical historical information while filtering out noise; when a Transformer is employed, self-attention processes the entire feature sequence in parallel, computing the correlations between any two time steps to capture global temporal dependencies.

To interface spatial encoders with a unified temporal encoder, we first summarize each spatial feature

H_{t}

into a vector token:

z_{t} = ϕ (H_{t}) \in R^{D_{z}}, t = 1, \dots, T,

(3)

where

ϕ (\cdot)

is a per-time-step readout operator (e.g., the global average pooling for CNN features or graph readout for GCN features). The temporal encoder then maps the token sequence to a context vector:

c_{T} = f_{temporal} ({z_{1}, \dots, z_{T}}; θ_{t}) \in R^{d_{c}},

(4)

where

f_{temporal}

denotes the temporal modeling network (LSTM or Transformer), and

θ_{t}

are its learnable parameters. The context

c_{T}

provides a dynamic summary of the history and serves as the basis for multi-horizon decoding.

3.2.3. Prediction Decoding

Finally, the decoder maps the context vector

c_{T}

output from the temporal module to the predicted future deformation fields while preserving the target spatial resolution. Typically, this is achieved through one or more fully connected layers and/or transposed convolutional layers, which decode high-dimensional spatiotemporal representations into prediction maps aligned with the target grid. We adopt a direct multi-horizon strategy, in which the F future steps are produced in one pass:

{\hat{X}}_{T + 1 : T + F} = f_{decode} (c_{T}; θ_{d}),

(5)

where

f_{decode}

denotes the decoder network, and

θ_{d}

are its learnable parameters. For GCN-based spatial encoders, we retained the latest-step node embeddings

H_{T}

and applied a small MLP (conditioned on

c_{T}

) to obtain node-wise predictions, which was followed by grid projection to the target resolution (e.g., scatter or IDW-based upsampling) to ensure spatial alignment with the output grid. During training, errors were only computed within valid monitoring regions using a masked loss, and all of the parameters were optimized by backpropagation to minimize this objective.

Through the above modular design, the model achieves effective decoupling and deep integration of spatial correlations and temporal dynamics. However, the performance of such hybrid deep learning models is highly dependent on architectural choices and numerous hyperparameters. Manual tuning is time consuming and unlikely to discover globally competitive configurations. To address this, we adopted Bayesian optimization with the Tree-structured Parzen Estimator (TPE) to automate hyperparameter search and stabilize performance across architectures. The TPE procedure and its acquisition rule (maximizing the ratio

l / g

) are detailed in the next subsection with Equations (6)–(10).

3.3. Spatiotemporal Predictive Modeling Based on Bayesian Optimization

Because surface deformation inherently contains both temporal information and spatial features, regression tasks in deep learning can be designed with separate spatial feature extraction modules and temporal dependency modeling modules, thereby jointly capturing spatiotemporal patterns and modeling their relationships. In this study, we formulated surface deformation prediction as a regression task, applying deep learning methods to model the target deformation field. To achieve reliable modeling and analysis, we introduced basic convolutional networks (e.g., CNN) and GCNs, as well as LSTM and Transformer modules, and we also constructed six hybrid deep learning architectures: CNN + LSTM, CNN + Transformer, GCN + LSTM, GCN + Transformer, ResNet + LSTM, and ResNet + Transformer. In this design, CNNs and GCNs are responsible for capturing the spatial features of displacement at high-coherence points for each time step (cf. Equation (2)), while LSTMs and Transformers focus on modeling the temporal dependencies of sequential data and summarize the history into a context vector

c_{T}

(cf. Equation (4)), which is then decoded into F-step forecasts (cf. Equation (5)).

The performance of the models derived from the above architectures is highly dependent on their hyperparameter configurations. For the structurally complex spatiotemporal prediction models considered in this study, numerous hyperparameters were involved, such as the learning rate, network depth, number of layers, dropout rate, and batch size, which interact with each other in nonlinear ways. Traditional manual tuning or grid search methods face a combinatorial explosion problem in high-dimensional spaces, making them inefficient and often incapable of identifying optimal configurations. To systematically and efficiently determine the best set of hyperparameters, this study adopted the TPE algorithm within the Bayesian optimization framework.

Bayesian optimization is an effective global optimization framework for identifying the optimum of a “black-box” function, and it is particularly suited for objective functions with high evaluation costs, such as training a deep learning model. Its core idea is to construct a probabilistic surrogate model of the objective function based on the results of a limited number of evaluations, and this model was used to guide the selection of the next most promising evaluation point. This process is governed by an acquisition function, which balances the trade-off between “exploitation” in regions where the current model predicts near-optimal values and “exploration” in regions with high uncertainty.

The TPE, as an advanced Bayesian optimization algorithm, differs from Gaussian-process-based approaches in the way its surrogate model is constructed. Instead of modeling

p (y ∣ λ)

(the probability of an objective value y given a hyperparameter configuration

λ

, TPE reformulates the problem by modeling

p (λ ∣ y)

, i.e., the probability distribution of hyperparameters that yield a given objective value. This reformulation enables TPE to flexibly handle complex and conditionally dependent hyperparameter spaces (see, for example, cases where the number of network layers determines the dimensionality of subsequent channel parameters), which is consistent with the hyperparameter structure of the models in this study. The detailed optimization procedure is as follows.

(1): Objective and search space. Define the validation loss to be minimized as $y = L_{val} (λ)$ , where $λ \in S$ denotes a hyperparameter configuration, and $S$ is the search space.
(2): Initialization and historical record. Maintain a history $H_{n} = {(λ^{(j)}, y^{(j)})}_{j = 1}^{n}$ of evaluated configurations and losses. Initialize $H_{n}$ with random samples.
(3): Data partitioning and probabilistic modeling. At each iteration, compute the quantile threshold:

$y^{*} = quantile ({y^{(j)}}_{j = 1}^{n}, γ), γ \in [0.15, 0.25],$

(6)

and split the observations into “good” and “bad” sets, $L = {y < y^{*}}$ , and $G = {y \geq y^{*}}$ . Fit KDEs for the two conditional densities:

$l (λ) \approx p (λ ∣ y < y^{*}),$

(7)

$g (λ) \approx p (λ ∣ y \geq y^{*}) .$

(8)
(4): Acquisition and candidate selection. Using the improvement threshold $y^{*}$ , the expected improvement (EI) for a candidate $λ$ is as follows:

$EI (λ) = \int_{- \infty}^{y^{*}} (y^{*} - y) p (y ∣ λ) d y,$

(9)

and, under the TPE factorization, maximizing EI is equivalent to maximizing the likelihood ratio $l / g$ :

$λ^{next} = arg max_{λ \in S} \frac{l (λ)}{g (λ)} .$

(10)
(5): Iterative optimization. Evaluate $y^{new} = L_{val} (λ^{next})$ , update $H_{n + 1} = H_{n} \cup {(λ^{next}, y^{new})}$ , and repeat Steps (3)–(5) until the evaluation budget is exhausted. The best configuration observed (lowest y) is returned as the optimum.

Ultimately, TPE returns the hyperparameter configuration with the lowest observed loss during the entire optimization process as the optimal solution. By leveraging Bayesian inference, the method intelligently exploits historical information to narrow the search space, focusing computational resources on more promising regions of the hyperparameter domain. As a result, it identifies superior model configurations with far fewer evaluations than conventional methods. Figure 2 illustrates the spatiotemporal prediction framework based on TPE.

Figure 2. Spatiotemporal prediction framework based on Bayesian optimization. The data preprocessing in the figure means converting sparsely distributed high-coherence point data into image data. Spatial encoding means using CNN or GCN to extract spatial features of displacement. Spatiotemporal modeling means using LSTM or Transformer to implement time series modeling of the spatial features extracted in the previous step. TPE optimization implements hyperparameter selection for different models in this process.

This study formulates the prediction of mining-induced deformation as a spatiotemporal regression problem. Given the spatial heterogeneity and long-term non-linear temporal accumulation of deformation fields [50,51], we set up a controlled comparison using three complementary spatial encoders and two mainstream temporal models. On the spatial front, CNNs characterize local neighborhood patterns, ResNets extract deeper, multi-scale features, and GCNs model non-Euclidean topological correlations based on high-coherence point adjacency relationships. Temporally, LSTMs capture sequential dependencies and long-term memory, while Transformers utilize self-attention to model global coupling and non-linear time-varying relationships across time steps. Based on this, we designed six hybrid architectures to encompass the primary technical approaches in contemporary spatiotemporal modeling and provided a comparative baseline. To quantify the independent contributions of each design choice, we employed a controlled modular evaluation. Under identical data partitioning and training conditions, we fixed the temporal module while rotating the spatial module to quantify spatial representation gains, as well as vice versa to quantify temporal modeling gains. Using CNN + LSTM as the minimal deep learning baseline [52,53], we compared the manually tuned and TPE auto-tuned models within the same comparative framework.

4. Experiment and Results

4.1. Experimental Setup

The entire TPE optimization process was configured to perform 50 trials. Initially, 20 random exploratory trials were conducted to construct the surrogate probability model, after which the guided optimization stage began. In this stage, the top 25% of observed loss values (i.e., the quantile threshold

γ = 0.25

) were used to fit the two conditional density models. The next hyperparameter candidate was selected by maximizing the EI acquisition function, which in TPE is proportional to the ratio

l (λ) / g (λ)

. The experiments were conducted in two groups: (i) models trained with hyperparameters tuned manually based on empirical experience, and (ii) models trained with hyperparameters optimized using TPE. Each run employed the same train/test split but different random weight initializations, enabling us to compute the mean performance over 10 runs and thereby reduce bias caused by random initialization.

To simulate a realistic operational forecasting environment, data partitioning employs a strict time-preserving strategy. To prevent temporal information leakage, the time series data is sequentially segmented: the initial 80% of time steps form the training set, the subsequent 10% constitute the validation set, and the final 10% are reserved as the test set. Crucially, to avoid statistical information leakage, all normalized parameters (mean and standard deviation) were computed solely based on the training subset and applied to both the validation and test sets. Input samples were constructed using a sliding window approach, taking the past T time phases as input to predict the next F time phases. Spatially, this study employed a fixed spatial domain configuration. As the objective is temporal extrapolation within a specified monitoring area rather than cross-domain transfer, the spatial grid blocks remain consistent across all subsets. This setup enables the model to learn the specific spatial heterogeneity of the study area while rigorously evaluating its generalization capability along the temporal axis.

In total, two groups of comparative experiments were conducted across the six architectures, yielding 3060 models for result analysis. For the TPE-based hyperparameter optimization experiments, given the substantial computational cost, we employed PyTorch (2.5.1)’s DistributedDataParallel (DDP) [54] for distributed training, which version is 2.5.1. Within the GPU cluster environment, DDP efficiently allocated training tasks across multiple GPUs for parallel execution, resulting in a substantial reduction in training time compared to single-GPU training.

4.2. Performance Comparison of Spatiotemporal Prediction Results

The poor performance of a given architecture may stem either from its inherent unsuitability for the task or from inadequate hyperparameter tuning. In studies combining deep learning and remote sensing to investigate hazards induced by natural resource exploitation, existing comparisons across models have often relied on manually adjusted hyperparameters. Such manual tuning is inherently subjective and contingent, potentially leading to unfair comparisons among models [40,55]. For the architectures considered in this study, CNN + LSTM, CNN + Transformer, GCN + LSTM, GCN + Transformer, ResNet + LSTM, and ResNet + Transformer, our experimental results present the performance under two conditions: models trained with empirically tuned hyperparameters and models trained with hyperparameters optimized through automated optimization.

Figure 3a–d and Table 1 compare the distributions of the six architectures across three evaluation metrics, the Root Mean Squared Error (

R M S E

), Mean Absolute Error (

M A E

), and the coefficient of determination (

R^{2}

). Table 1 presents the respective mean values for these metrics. The results demonstrate that the models optimized via TPE consistently outperformed those manually tuned. In this study, statistical testing was required on all 18 TPE-versus-manual mean differences. To avoid the inflation of Type I errors due to multiple comparisons (without correction and under

α = 0.05

and independence, the probability of observing at least one false positive is

1 - {(0.95)}^{18} \approx 0.60

), we applied a two-sided Welch’s t-test [56] followed by Benjamini–Hochberg (BH) correction [57] to control the false discovery rate (FDR) at 0.05 [58]. The results showed that all 18 FDR-adjusted p-values (

p_adj

) were significant, indicating that the performance differences between the two tuning strategies were not attributable to chance.

Figure 3. Comparison of model performance using TPE optimization and manual parameter tuning. (a–c) The performance of all models using

R M S E

,

R^{2}

, and

M A E

, respectively. (d) Performance improvement after TPE-based hyperparameter optimization compared with manual tuning. CL, CT, GL, GT, RL, and RT denote CNN + LSTM; CNN + Transformer; GCN + LSTM; GCN + Transformer; ResNet + LSTM; and ResNet + Transformer, respectively. In the diagram (a–c), different colored bands represent the positions of different architectures on the X-axis, and statistical indicators on the same X-axis are placed in the same colored band box.

Table 1. The mean values of evaluation metrics.

The forest plot in Figure 4 shows that, across all 18 paired comparisons involving six architectures and three evaluation metrics (

R M S E

,

M A E

, and

R^{2}

), the mean differences were consistently positive, with their 95% confidence intervals entirely located to the right of zero. The corresponding FDR-adjusted p-values (Benjamini–Hochberg) were all significant. Taken together with the violin plot distribution comparisons, these findings confirm that TPE-based hyperparameter optimization yields consistent and statistically robust average performance improvements across all architectures and metrics. This result provides strong evidence that, in complex tasks (such as remote sensing-based spatiotemporal prediction), hyperparameter optimization is both a fundamental and indispensable step for fully exploiting the model’s potential.

Figure 4. Average performance improvement of TPE over manual tuning. Each row corresponds to an architecture: the first column lists the architecture; the second shows the mean difference and its 95% confidence interval (CI) (

R M S E

/

M A E

: Manual–TPE;

R^{2}

: TPE–Manual); the third shows horizontal error bars with the zero reference as a dashed line; and the fourth reports the FDR-corrected p-value and significance marker (red dot), as well as the conservative lower bound (LB, 95%). All

p_adj

values are significant and no CI crosses zero, indicating consistent and robust gains from TPE across architectures and metrics.

4.3. Optimization Process Analysis

We further compared the optimization processes across all experimental architectures, as illustrated in Figure 5. Panels (a) and (b) fix the temporal encoding module, while Panels (c), (d), and (e) fix the spatial encoding module. The convergence curves (dashed lines) for all architectures exhibit a progressively denser distribution, which reflects a typical characteristic of the Bayesian optimization paradigm. In the initial exploratory phase (e.g., Trials 0 to 15), candidate hyperparameters are sampled over a wide range to construct a surrogate model of the loss function, resulting in dispersed distributions along the horizontal axis and large fluctuations along the vertical axis of validation loss. As the number of trials increases, TPE gradually identifies promising low-loss regions, and subsequent samples become increasingly concentrated in these regions. Correspondingly, the curves show reduced vertical fluctuations and greater density, indicating that the optimization process transitions from exploration to exploitation, with the surrogate model progressively refining its characterization of low-loss regions.

Figure 5. Validation loss curve distribution during TPE optimization process. (a,b) Comparison of the optimisation process for fixed-time encoding modules. (c–e) Optimisation process during fixed-space encoding. Dashed lines denote 10 TPE optimisation iterations, while solid lines represent the average effective loss across 10 optimisation runs. The meanings of CL, CT, GL, GT, RL, and RT correspond to the abbreviations in Figure 3. Grey and red dashed lines serve as reference lines. H1, H2, and V1–V4 are auxiliary lines illustrating the density of loss curves across different architectures. The green auxiliary lines in (c–e) depict the iteration speeds of each architecture.

Referring to the red dashed lines in Figure 5a,b, a comparison with the horizontal auxiliary lines H1 and H2, and the vertical auxiliary lines V1 and V2 revealed that the convergence curves of the Transformer-based architecture were generally more densely clustered, whereas those of the LSTM-based architecture were more dispersed. Furthermore, examining the three experimental intervals separated by grey dashed lines—(0, 10), (10, 30), and (30, 40) revealed that the Transformer architecture’s mean loss curve begins flattening earlier, specifically within the (0, 10) interval. This phenomenon is more pronounced in experiments comparing fixed feature extraction modules, as shown in Figures (c)–(e). The grey dashed line at trail = 30 serves as the reference line, with the light green line acting as an auxiliary indicator. The Transformer-based architecture stabilises before trial = 30, whereas the LSTM-based architecture converges towards stability after 30 iterations. Furthermore, the comparison between V3 and V4 also demonstrates that the Transformer-based architecture exhibits greater stability in this experiment. It is noteworthy that across all architectures, the results from ten independent optimisation runs consistently converged to the same region, highlighting the robustness of the optimisation outcomes.

In addition, comparison of Figure 5a,b shows that, within the (0, 10) interval, the mean loss curve of Transformer-based architectures first flattens and then slightly decreases. Based on this observation, we hypothesize that, in the hyperparameter space of this task, Transformers are more likely to form accessible high-quality regions. These are characterized by a broad and easily reachable suboptimal plateau and a narrower, deeper expected optimal core (see Figure 6a), making the optimization process highly deterministic and efficient. By contrast, the relatively sparse loss curves of LSTM-based architectures throughout the entire process suggest that their hyperparameter space is not a single deep valley but rather a relatively flat landscape containing multiple suboptimal regions with comparable performance (see Figure 6b). This indicates a more stochastic optimization process, a hypothesis consistent with the patterns observed in Figure 5.

Figure 6. Schematic diagram of the parameter space of different architectures. (a) Transformer-based architectures exhibit a broad plateau, as shown by the arrow, and a narrow optimal core close to the plateau. (b) LSTM-based architectures have a relatively flat landscape containing multiple suboptimal regions.

The above experimental results and analyses indicate that, with the aid of advanced optimization algorithms such as TPE, Transformer-based architectures are more likely to achieve superior validation performance during the optimization process. Examination of the optimal loss values further demonstrates that the temporal modeling capacity of Transformers surpasses that of LSTMs. For spatiotemporal prediction in natural resource exploitation areas, Transformers should, therefore, be considered the preferred temporal encoding module, coupled with appropriate hyperparameter optimization.

4.4. Deformation Susceptibility Assessment and Reliability Quantification

Planners in natural resource extraction zones need more than a snapshot of current surface deformation susceptibility risk (potential risk). They need a clear view of how that risk will evolve. Spatiotemporal prediction provides this capability. Based on the comparative evaluation of model performance, we selected the ResNet + Transformer architecture as the candidate solution for engineering applications. We employed this architecture and its Tree-structured Parzen Estimator optimized configuration to assess deformation susceptibility across the study area.

To reduce the subjectivity of manually setting thresholds, this study used a distribution-driven quantile thresholding method to determine susceptibility grades. We constructed a prediction ensemble from the Top-10 models obtained during the optimization stage, and we denote the ensemble size as

K = 10

. For each pixel x, we compute the ensemble median displacement as follows:

m (x) = {median}_{k = 1, \dots, K} d_{k} (x),

and we use its magnitude

| m (x) |

as a measure of deformation intensity. Let

Ω_{valid}

denote the set of valid pixels where all ensemble members are available. We then compute the global quantile thresholds on valid pixels as follows:

Q_{τ} = {Quantile}_{τ} ({| m (x) | : x \in Ω_{valid}}), τ \in {85, 95, 99},

consequently, the surface deformation susceptibility within the study area is categorized into four levels, corresponding to the intervals

| m | < Q_{85}

,

Q_{85} \leq | m | < Q_{95}

,

Q_{95} \leq | m | < Q_{99}

, and

| m | \geq Q_{99}

, respectively: Low, Medium, High, and Highest. The spatial distribution of surface deformation susceptibility is mapped in Figure 7a to illustrate this ensemble consensus.

Figure 7. Deformation susceptibility and reliability based on ensemble prediction. (a) Susceptibility map derived from the ensemble median

| m (x) |

with global thresholds

Q_{85}

,

Q_{95}

, and

Q_{99}

. (b) Intra-class confidence score for the assigned susceptibility level in (a), defined as the probability mass of that class derived from exceedance probabilities. (c) Ensemble predictive uncertainty of displacement magnitude. (d) Ensemble rank variance of susceptibility class assignment.

A single susceptibility map cannot express the intrinsic variability of the predictions. We quantified uncertainty using three complementary ensemble statistics: (i) exceedance probabilities (and the resulting class confidence), (ii) predictive standard deviation of displacement magnitude, and (iii) the variance of susceptibility ranks across ensemble members. For each pixel x and the k-th model in the ensemble, let the predicted displacement be

d_{k} (x)

and let

| d_{k} (x) |

denote its magnitude. After defining susceptibility levels using

| m (x) |

and the quantiles

Q_{85}

,

Q_{95}

, and

Q_{99}

, we used the ensemble to estimate the probability that each pixel reaches different intensity levels. For every pixel and each quantile

Q_{τ}

with

τ \in {85, 95, 99}

, we defined the exceedance probability:

P_{τ} (x) = \frac{1}{K} \sum_{k = 1}^{K} I (| d_{k} (x) | \geq Q_{τ}),

where

I (\cdot)

is the indicator function, and

P_{85} (x)

,

P_{95} (x)

, and

P_{99} (x)

represent the probability that a pixel reaches at least Medium, High, or Highest displacement intensity across the K models.

We then defined the probability mass for each susceptibility class as Low

= 1 - P_{85}

, Medium

= P_{85} - P_{95}

, High

= P_{95} - P_{99}

, Highest

= P_{99}

. For each pixel, we used the probability associated with its assigned class as an intra-class confidence score, reflecting how strongly the ensemble supported that classification (Figure 7b). In addition, we computed the standard deviation of

{| d_{k} (x) {|}}_{k = 1}^{K}

to quantify the numerical uncertainty in displacement magnitude (Figure 7c), and we computed the variance of the susceptibility ranks obtained by thresholding each

| d_{k} (x) |

to quantify the instability in class assignment (Figure 7d).

As shown in Figure 7c,d, the standard deviation in Figure 7c measures the uncertainty in displacement magnitude. High values indicate substantial disagreement in predicted displacement amplitude, suggesting unstable deformation intensity, even when the susceptibility level is consistent. The variance in Figure 7d measures the stability of class assignment. Larger variance indicates stronger oscillation among the Low, Medium, High, and Highest categories, implying weaker agreement across the models.

To unify susceptibility and numerical uncertainty within a single interpretive framework for engineering decision making, we combined the susceptibility classes in Figure 7a with the numerical uncertainty in Figure 7c to produce a bivariate map, as shown in Figure 8. The legend uses a two-dimensional matrix layout. The vertical axis encodes increasing deformation susceptibility from bottom to top, which is consistent with the classes shown in Figure 7a. The horizontal axis groups pixels by the quantiles of the prediction standard deviation, increasing from left to right to represent rising uncertainty. This design merges deformation susceptibility with the stability of model judgments into a single interpretive plane. Robust high susceptibility zones appear in the upper left quadrant, where predicted displacement is large and model disagreement is low. These areas represent genuine deformation risk and should receive the highest priority for engineering intervention. Unstable high susceptibility zones appear in the upper right quadrant, where the median prediction indicates high susceptibility but the ensemble shows strong divergence. These cases likely reflect either model instability or complex noise. Field verification should, therefore, precede any structural reinforcement.

Figure 8. Bivariate mapping of deformation susceptibility levels and uncertainty. The susceptibility levels are consistent with those in Figure 7a, divided into Low, Medium, and High. Uncertainty is determined by dividing Figure 7c into Low, Medium, and High Uncertainty. In the figure, the vertical direction of the different colored squares represents the susceptibility level, with the level increasing from bottom to top. The horizontal direction of the squares represents the uncertainty level, with the level increasing from left to right.

This bivariate representation extends the standalone susceptibility map into a more interpretable risk surface. It identifies not only the areas with the highest deformation potential, but also the locations where model judgments remain unstable and require greater safety margins in subsequent field validation and monitoring design.

5. Discussion

A key component of this study was the use of a Tree-structured Parzen Estimator for hyperparameter optimization, as detailed in Section 3.3. We applied hyperparameter optimization under comparable conditions to evaluate six model architectures. The search had a high computational cost and required several thousand training and evaluation iterations. To keep the search tractable, we defined a specific readout operation, described in Section 3.2 and expressed in Equation (3), to encode the two-dimensional spatial feature map at each time step (denoted

H_{t}

). We then pooled and flattened this representation to obtain a vector

z_{t}

. The vector was passed to a temporal encoder implemented with a Transformer or a similar architecture. This readout strategy improved computational efficiency at the expense of local spatial detail, enabling an extensive hyperparameter search.

Based on this design, the Results section first shows that, across six architecture classes and three evaluation metrics, the TPE produced substantial gains in average performance relative to manual tuning. We then selected the ResNet + Transformer model as the engineering solution and used the Top-10 prediction ensemble to generate deformation susceptibility maps for the mining area, as shown in Figure 7 and Figure 8. To clarify where performance gains occurred and how they affect risk representation, we analyzed pixel-level residuals using the same criteria as those in the Results section. Signed residuals revealed systematic bias and spatial patterns, while absolute residuals quantified error magnitude and spatial hotspots.

5.1. Spatial Distribution Characteristics of Errors

To locate where the model improved, to identify the underlying causes, and to clarify their implications for susceptibility mapping, we analyzed pixel-level residuals from two complementary perspectives. The signed residuals in Figure 9a reveal systematic biases and spatial patterns. The spatial structure shows that the errors were not random. Patch-like overestimation and small pockets of underestimation appeared along the fault zone. The open pit mines, especially the active Hambach site, exhibited pronounced local overestimation and underestimation, whereas the Garzweiler mine and the decommissioned Inden site remained comparatively stable.

Figure 9. Spatial residual diagnostics for the TPE-ResNet+Transformer model. (a) Spatial distribution of signed residuals (red: overestimation; blue: underestimation). (b) Residual Q–Q plot with the theoretical line, OLS fit, and robust fit; dashed gray lines mark symmetric quantiles (left tail: Q1/Q2.5/Q5; right tail: Q95/Q97.5/Q99), and hollow symbols highlight extreme tails (≥Q99 or ≤Q1).

The Q–Q diagnostic in Figure 9b shows that the central portion of the point cloud aligned more closely with a robust fit but deviated from the theoretical normal line. The upper tail rose sharply at high quantiles, and the lower tail showed a weaker yet noticeable deviation. These features indicate genuine heavy tailed behavior and extreme residuals. Their spatial distribution suggests that extreme errors likely occurred within mining areas or along fault zones, reflecting the intrinsic complexity of deformation processes in these regions. These findings point to future improvements through deformation modeling tailored specifically to mining zones and fault-controlled structures.

5.2. Effects of TPE Optimization on Error Structure

Figure 10a–l compare the spatial distribution of absolute residuals for six model architectures under manual tuning and TPE optimization. The residual patterns show that prediction errors were not randomly distributed. They concentrated along geological discontinuities, such as fault zones and mining boundaries, revealing strong structural dependence. Comparing the pre- and post-optimization residual fields identifies the specific areas where the optimization improved performance. Panels a–d and b–e show that the optimization reduced the spatial density of high residuals. Local zoomed views further confirm substantial reductions along the fault and in the southwestern hotspot near Inden, indicating a pronounced contraction of high-error regions. These results demonstrate that the optimization not only improved average performance metrics, but also enhanced the spatial compactness of prediction errors.

Figure 10. Residual distributions of models with TPE optimization versus manual tuning. (a–c, g–i) The residual distributions after manual parameter tuning. (d–f, j–l) The residual distributions after TPE optimization. The area enclosed by the red dashed circle/rectangle and the black rectangle represents the regions where the spatial residuals decreased before and after optimization. The thicker red dashed line indicates the trend of the fault. The CNN + LSTM, CNN + Transformer, GCN + LSTM, GCN + Transformer, ResNet + LSTM, and ResNet + Transformer models are abbreviated as CNN + L, CNN + T, GCN + L, GCN + T, ResNet + L, and ResNet + T, respectively. The abbreviations are prefixed with M, such as M-CNN + L, denoting a manually tuned model, and prefixed with TPE, such as TPE-CNN + L, representing a TPE-optimized model.

The statistical significance demonstrated in Section 4.2 holds equal importance for practical operational monitoring. Figure 4 presents the mean difference between the manual and TPE methods alongside their 95% confidence intervals. LB (95%) serves as a conservative lower bound for relative improvement, characterizing the minimum reproducible performance gain under repeated training conditions. This lower bound is more relevant to engineering applications than simple p-values, as monitoring and management require predictable error convergence margins. Mine management prioritizes error behavior in structurally sensitive zones, such as fault zones and mining boundary areas. Residual space diagnostics in Figure 10 reveal that optimization compresses high-residual regions near faults and hotspots, thereby reducing misclassification risks in critical zones and enhancing the stability of field verification and monitoring deployment priority assessments.

The architecture comparison indicates that residual behavior near faults varied across models. The ResNet-based models outperformed CNN and GCN architectures in suppressing boundary errors along fault zones. This suggests that deeper ResNet structures extracted features more effectively in regions characterized by strong geological gradients. The advantage is evident in the spatial residual patterns along the fault, where the red dashed line marks the mapped fault trace in Figure 10.

Spatial overlay analysis of physical uncertainty (standard deviation, Figure 7c in Section 4.4) and absolute residuals (Figure 10) reveals a high degree of spatial consistency between areas of poor prediction accuracy and regions of high uncertainty. Areas with high residuals (lower prediction accuracy) and areas with high uncertainty (significant discrepancies in model perception) are spatially consistent. This indicates that the uncertainty in the model output is not statistical noise, but rather a true reflection of the physical entities. Specifically, high-uncertainty areas in the model point to geological discontinuities (faults) and areas of human disturbance (such as the Hambach mining area), demonstrating that the model can provide a basis for engineering decisions based on high-uncertainty indicators.

Building on this analysis, we jointly used residuals and uncertainty to interpret the deformation susceptibility results. Regions that fell in the high quantiles of both absolute residuals and uncertainty represent locations where the model struggled to fit displacement magnitude and to produce stable susceptibility classifications. These areas should receive priority for dense field monitoring, such as GNSS or corner reflectors, to obtain direct physical validation. Regions with high uncertainty but negligible residuals typically lie near quantile thresholds, where the model captured the average displacement level but remained sensitive to class boundaries. Local densification of observations and threshold recalibration can improve the stability of classifications in these zones. Areas with low uncertainty and low residuals provide stable and reliable predictions and can be incorporated into routine management with only low frequency inspections.

To investigate how deformation characteristics in different regions control susceptibility, we analyzed the spatial patterns of exceedance probabilities computed in Section 4.4. As shown in Figure 11, increasing the threshold from

Q_{85}

to

Q_{99}

produced sharply different responses across regions. Within the mining area, the number of exceedance pixels declined as the threshold increased. Despite the evident broad subsidence in mining fields, the physical displacement magnitude remained concentrated in the medium-to-high range and did not produce extreme outliers. In contrast, pixels along the fault zone retained a clear high probability belt, even at the

Q_{99}

threshold. This spatial contrast demonstrates the higher physical intensity and stronger spatial concentration of fault-controlled deformation. It also shows that the fault zone is more hazardous than the mining area. Figure 11c illustrates that this zone coincides with a region classified as extremely high susceptibility with moderate uncertainty in Figure 8. This location requires supplemental monitoring beyond InSAR to strengthen observational coverage.

Figure 11. Spatial distribution of exceedance probabilities for three deformation severity thresholds. Panels (a–c) show the exceedance probabilities

P_{85} (x)

,

P_{95} (x)

, and

P_{99} (x)

, defined as the fraction of ensemble members whose predicted displacement magnitudes

| d_{k} (x) |

exceed the global 85th, 95th, and 99th percentile thresholds derived from

| m (x) |

. Higher values indicate a greater likelihood that a pixel reaches at least the corresponding deformation intensity level across the ensemble.

Specifically, regions exhibiting both high probability and high residuals should be accorded priority for denser monitoring and field verification; regions demonstrating high probability but negligible high residuals are frequently characterized by threshold-sensitive belts and are more effectively addressed through initial observation and local calibration; regions exhibiting low probability and low residuals can be placed under routine management. This workflow converts statistical improvements into executable graded actions, enhancing both interpretability and usability of risk maps for planning and operations.

From a mechanistic perspective, the non-stationarity of faults and mining zones imposes cross-scale and cross-boundary requirements on feature extraction. Convolutional neural networks, due to their local receptive fields and invariant weight translations [59], have inherent limitations in cross-fault context fusion and error control near boundaries, and they are prone to generating large residuals on both sides of the fault. Graph convolutional networks can suppress local noise through graph structures, but they tend to over-smooth at sharp boundaries, weakening fault gradients [60]. Residual networks alleviate gradient vanishing through multi-scale residual paths and provide channels for high- and low-level feature fusion [61], while the Transformer’s long-range dependency modeling helps capture the coexistence of slow subsidence and local abrupt changes [62]. This also explains why, under TPE optimization, the combination of ResNet and Transformer has a more significant residual compression effect near fault zones and mining boundaries.

5.3. Limitations and Prospects

Although the TPE optimization model proposed herein demonstrates excellent performance within the study area, several challenges remain in practical engineering applications that require focused resolution in future work. Firstly, InSAR time series inevitably suffer from atmospheric delay, coherence decay, and disentanglement errors. These errors amplify at low deformation magnitudes and near structural boundaries, thereby elevating prediction uncertainty and compromising the stability of susceptibility grading. Subsequent work will, therefore, incorporate pixel-level observation uncertainty and external atmospheric correction products into pre-processing and training weighting, explicitly propagating observation errors during threshold calculation and probabilistic mapping stages [63,64,65]. Secondly, the Sentinel-1 sequence and quality screening introduce irregular temporal sampling and temporal gaps. Subsequent work will explicitly encode temporal intervals and incorporate continuous-time or gap-aware temporal modeling strategies to enhance extrapolation robustness under temporal gaps and non-uniform sampling conditions [66]. Moreover, the current framework relies primarily on data-driven learning, without explicitly incorporating physical consistency constraints for mining subsidence, strata compression, and fault activity into the model. Subsequent work will employ physics-guided regularization or hybrid modeling to integrate key mechanical priors into the network, thereby reducing non-physical predictions and enhancing cross-scenario transferability [67]. Finally, while validated within the Rhineland coalfield, the model may encounter distribution drift when applied across mining areas or time periods. Subsequent work will employ cross-regional spatial hold-out and cross-period temporal hold-out for extrapolation assessment. This will be combined with integrated uncertainty and dataset bias diagnostics to identify potential failure zones, thereby providing more robust prioritization for monitoring densification and field verification [68]. Building upon these enhancements, we shall further incorporate mining plans, fault geometry, and other driving factors to construct a more comprehensive multimodal prediction framework. This will improve the reliability statements required for engineering decision making.

6. Conclusions

We developed a unified framework that links spatiotemporal prediction of surface deformation in mining areas with deformation susceptibility assessment by combining deep learning architectures with a Tree-structured Parzen Estimator Bayesian optimization scheme. The framework not only optimizes model performance, but also uses the ensemble of models generated during optimization to evaluate deformation susceptibility and to quantify the associated uncertainty.

Six deep learning frameworks, CNN + LSTM, CNN + Transformer, GCN + LSTM, GCN + Transformer, ResNet + LSTM, and ResNet + Transformer were systematically evaluated within this framework. The experimental results were statistically analyzed with a two-sided Welch t-test. Multiple comparison corrections were conducted at False Discovery Rate 0.05 using the Benjamini–Hochberg method. From these, it can be observed that TPE yields a statistically significant and consistent average performance improvement compared to manual parameter tuning. Based on these results, we selected the ResNet+Transformer model as the primary architecture for deformation susceptibility assessment and used the ten best-performing models from the optimization stage as the prediction ensemble. The standard deviation and variance of their inference results served as the basis for uncertainty quantification.

More importantly, spatial analyses of both residual types indicated that model errors are predominantly concentrated within mining areas and fault zones, thereby guiding model refinement efforts. Concurrently, comparative analysis of absolute residuals between manually adjusted and TPE-optimized models further confirms that the TPE method structurally mitigates errors in regions of complex deformation. These findings demonstrate that automated hyperparameter optimization constitutes an essential step in enhancing the spatiotemporal predictive capability of surface deformation models. Moreover, analysis of the super-threshold probability revealed distinct response characteristics across different zones. The mining area exhibited pronounced threshold sensitivity, reflecting the widespread yet moderate magnitude of mining-induced subsidence. Conversely, fault zones retained distinct high-probability bands, even at extreme thresholds, demonstrating that fault-controlled deformation possesses greater physical intensity and concentration, thus rendering it a more hazardous potential hazard source than the mining area.

The spatial relationship between physical uncertainty, represented by the standard deviation, and absolute residuals shows that the model’s uncertainty captures the physical complexity of fault-controlled and human-disturbed zones rather than statistical noise. After performing susceptibility assessment and uncertainty quantification, engineering decisions should incorporate the spatial characteristics of both uncertainty and residuals. The proposed framework translates statistical metrics into operational risk management measures and increases the practical utility of susceptibility maps in natural resource extraction planning.

Author Contributions

Conceptualization, data curation, formal analysis, methodology, software, supervision, validation, visualization, writing—original draft and writing—review and editing, M.L.; conceptualization, funding acquisition, methodology, supervision and writing—review and editing, S.L.; conceptualization, methodology and writing—review and editing, T.L.; writing—review and editing, W.W.; writing—review and editing, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (Grant Nos. 42530710, 42377453, and 42530236); the Science and Technology Innovation Program of Hunan Province (Grant No. 2021RC4037), and the China Scholarship Council (Grant No. 202308430215).

Data Availability Statement

Data and code can be obtained by contacting the corresponding authors.

Acknowledgments

This research was partially supported by SARKI4Tagebaufolgen project financed via Bundesministerium für Wirtschaft und Energie (BMWE) in Germany. Maoqi Liu acknowledges that Mahdi Motagh assisted us in completing the final proofreading of the manuscript and offered some suggestions, and appreciates the financial support from the China Scholarship Council.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could influence the work reported in this study.

Abbreviations

The following abbreviations are used in this manuscript:

InSAR	Interferometric Synthetic Aperture Radar
SAR	Synthetic Aperture Radar
EGMS	European Ground Motion Service
IDW	Inverse Distance Weighting
HCP	High-Coherence Point
CNN	Convolutional Neural Network
GCN	Graph Convolutional Network
LSTM	Long Short-Term Memory
MLP	Multi-Layer Perceptron
ResNet	Residual Network
Transformer	Transformer Encoder–Decoder Architecture
TPE	Tree-Structured Parzen Estimator
BO	Bayesian Optimization
KDE	Kernel Density Estimation
EI	Expected Improvement
$R M S E$	Root Mean Squared Error
$M A E$	Mean Absolute Error
$R^{2}$	Coefficient of Determination
FDR	False Discovery Rate
BH	Benjamini–Hochberg (procedure)
CI	Confidence Interval
OLS	Ordinary Least Squares
Q–Q	Quantile–Quantile (plot)

References

Ohenhen, L.O.; Zhai, G.; Lucy, J.; Werth, S.; Carlson, G.; Khorrami, M.; Onyike, F.; Sadhasivam, N.; Tiwari, A.; Ghobadi-Far, K.; et al. Land subsidence risk to infrastructure in US metropolises. Nat. Cities 2025, 2, 543–554. [Google Scholar] [CrossRef]
Ao, Z.; Hu, X.; Tao, S.; Hu, X.; Wang, G.; Li, M.; Wang, F.; Hu, L.; Liang, X.; Xiao, J.; et al. A national-scale assessment of land subsidence in China’s major cities. Science 2024, 384, 301–306. [Google Scholar] [CrossRef]
Haghshenas Haghighi, M.; Motagh, M. Uncovering the impacts of depleting aquifers: A remote sensing analysis of land subsidence in Iran. Sci. Adv. 2024, 10, eadk3039. [Google Scholar] [CrossRef]
Motagh, M.; Shamshiri, R.; Haghighi, M.H.; Wetzel, H.U.; Akbari, B.; Nahavandchi, H.; Roessner, S.; Arabi, S. Quantifying groundwater exploitation induced subsidence in the Rafsanjan plain, southeastern Iran, using InSAR time-series and in situ measurements. Eng. Geol. 2017, 218, 134–151. [Google Scholar] [CrossRef]
Kuang, X.; Liu, J.; Scanlon, B.R.; Jiao, J.J.; Jasechko, S.; Lancia, M.; Biskaborn, B.K.; Wada, Y.; Li, H.; Zeng, Z.; et al. The changing nature of groundwater in the global water cycle. Science 2024, 383, eadf0630. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Motagh, M.; Xia, Z.; Lu, Z.; Karimzadeh, S.; Zhou, C.; AV Shevchenko; Roessner, S. Characterization of transient movements within the Joshimath hillslope complex: Results from multi-sensor InSAR observations. PFG–J. Photogramm. Remote Sens. Geoinf. Sci. 2024, 92, 629–648. [Google Scholar] [CrossRef]
Herrera-García, G.; Ezquerro, P.; Tomás, R.; Béjar-Pizarro, M.; López-Vinielles, J.; Rossi, M.; Mateos, R.M.; Carreón-Freyre, D.; Lambert, J.; Teatini, P.; et al. Mapping the global threat of land subsidence. Science 2021, 371, 34–36. [Google Scholar] [CrossRef] [PubMed]
Ma, S.; Qiu, H.; Yang, D.; Wang, J.; Zhu, Y.; Tang, B.; Sun, K.; Cao, M. Surface multi-hazard effect of underground coal mining. Landslides 2023, 20, 39–52. [Google Scholar] [CrossRef]
Chen, J.; Dai, Z.; Dong, S.; Zhang, X.; Sun, G.; Wu, J.; Ershadnia, R.; Yin, S.; Soltanian, M.R. Integration of deep learning and information theory for designing monitoring networks in heterogeneous aquifer systems. Water Resour. Res. 2022, 58, e2022WR032429. [Google Scholar] [CrossRef]
Chang, T.; Yi, Y.; Jiang, H.; Li, R.; Lu, P.; Liu, L.; Wang, L.; Zhao, L.; Zwieback, S.; Zhao, J. Unraveling the non-linear relationship between seasonal deformation and permafrost active layer thickness. Npj Clim. Atmos. Sci. 2024, 7, 308. [Google Scholar] [CrossRef]
Chung, J.; Manga, M.; Kneafsey, T.; Mukerji, T.; Hu, M. Deep learning forecasts the spatiotemporal evolution of fluid-induced microearthquakes. Commun. Earth Environ. 2025, 6, 643. [Google Scholar] [CrossRef]
Reichstein, M.; Benson, V.; Blunk, J.; Camps-Valls, G.; Creutzig, F.; Fearnley, C.J.; Han, B.; Kornhuber, K.; Rahaman, N.; Schölkopf, B.; et al. Early warning of complex climate risk with integrated artificial intelligence. Nat. Commun. 2025, 16, 2564. [Google Scholar] [CrossRef]
He, Y.; Yan, H.; Yang, W.; Yao, S.; Zhang, L.; Chen, Y.; Liu, T. Time-series analysis and prediction of surface deformation in the Jinchuan mining area, Gansu Province, by using InSAR and CNN–PhLSTM network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 6732–6751. [Google Scholar] [CrossRef]
Zhang, X.; Chen, Q.; Yang, M.; Zhao, Z.; Zheng, Y.; Dai, Q.; He, Y.; Cai, D.; Xu, T. Surface deformation monitoring and prediction of Longtantian Open-pit Mine based on SBAS-InSAR and CNN-BiLSTM techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 17706–17727. [Google Scholar] [CrossRef]
Ma, P.; Jiao, Z.; Wu, Z. Robust time-series InSAR deformation monitoring by integrating variational mode decomposition and gated recurrent units. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 18, 3208–3221. [Google Scholar] [CrossRef]
Rouet-Leduc, B.; Jolivet, R.; Dalaison, M.; Johnson, P.A.; Hulbert, C. Autonomous extraction of millimeter-scale deformation in InSAR time series using deep learning. Nat. Commun. 2021, 12, 6480. [Google Scholar] [CrossRef] [PubMed]
Cohen-Waeber, J.; Bürgmann, R.; Chaussard, E.; Giannico, C.; Ferretti, A. Spatiotemporal patterns of precipitation-modulated landslide deformation from independent component analysis of InSAR time series. Geophys. Res. Lett. 2018, 45, 1878–1887. [Google Scholar] [CrossRef]
Zhang, G.; Xu, Z.; Chen, Z.; Wang, S.; Cui, H.; Zheng, Y. Predictable condition analysis and prediction method of SBAS-InSAR coal mining subsidence. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5232914. [Google Scholar] [CrossRef]
Costantini, M.; Minati, F.; Trillo, F.; Ferretti, A.; Passera, E.; Rucci, A.; Dehls, J.; Larsen, Y.; Marinkovic, P.; Eineder, M.; et al. EGMS: Europe-wide ground motion monitoring based on full resolution InSAR processing of all Sentinel-1 acquisitions. In Proceedings of the IGARSS 2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: New York, NY, USA, 2022; pp. 5093–5096. [Google Scholar]
Wang, Q.; Huang, R. RES-STF: Spatio temporal fusion of visible infrared imaging radiometer suite and Landsat land surface temperature based on Restormer. J. Remote Sens. 2024, 4, 0208. [Google Scholar] [CrossRef]
Li, X.; Zhou, Y.; Wang, F. Advanced information mining from ocean remote sensing imagery with deep learning. J. Remote Sens. 2022, 2022, 9849645. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In European Conference on Computer Vision; Springer International Publishing: Cham, Switzerland, 2016; pp. 630–645. [Google Scholar]
Salehinejad, H.; Sankar, S.; Barfett, J.; Colak, E.; Valaee, S. Recent advances in recurrent neural networks. arXiv 2017, arXiv:1801.01078. [Google Scholar]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A search space odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Ma, P.; Wu, Z.; Zhang, Z.; Au, F.T. SAR-Transformer-based decomposition and geophysical interpretation of InSAR time-series deformations for the Hong Kong–Zhuhai–Macao Bridge. Remote Sens. Environ. 2024, 302, 113962. [Google Scholar] [CrossRef]
Liu, M.; Liu, X.; Wu, L.; Peng, T.; Zhang, Q.; Zou, X.; Tian, L.; Wang, X. Hybrid spatiotemporal graph convolutional network for detecting landscape pattern evolution from long-term remote sensing images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4413716. [Google Scholar] [CrossRef]
Zhang, Z.; Cui, P.; Zhu, W. Deep learning on graphs: A survey. IEEE Trans. Knowl. Data Eng. 2020, 34, 249–270. [Google Scholar] [CrossRef]
Zhou, C.; Ye, M.; Xia, Z.; Wang, W.; Luo, C.; Muller, J.P. An interpretable attention-based deep learning method for landslide prediction based on multi-temporal InSAR time series: A case study of Xinpu landslide in the TGRA. Remote Sens. Environ. 2025, 318, 114580. [Google Scholar] [CrossRef]
Long, S.; Liu, M.; Xiong, C.; Li, T.; Wu, W.; Ding, H.; Zhang, L.; Zhu, C.; Lu, S. Research on prediction of surface deformation in mining areas based on TPE-optimized integrated models and multi-temporal InSAR. Remote Sens. 2023, 15, 5546. [Google Scholar] [CrossRef]
Hüllermeier, E.; Waegeman, W. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Mach. Learn. 2021, 110, 457–506. [Google Scholar] [CrossRef]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the International Conference on Machine Learning (ICML), New York, NY, USA, 19–24 June 2016; PMLR. pp. 1050–1059. [Google Scholar]
Chen, C.; Dong, B.; Lin, J.; Shen, Z.; Fang, L.; Weng, Y.; Wang, K. Bayesian deep learning framework for updating landslide susceptibility assessment based on epistemic uncertainty with InSAR augmented samples. J. Rock Mech. Geotech. Eng. 2025. [Google Scholar] [CrossRef]
Cooper, A.F.; Lu, Y.; Forde, J.; De Sa, C.M. Hyperparameter optimization is deceiving us, and how to stop it. Adv. Neural Inf. Process. Syst. 2021, 34, 3081–3095. [Google Scholar]
Bischl, B.; Binder, M.; Lang, M.; Pielok, T.; Richter, J.; Coors, S.; Thomas, J.; Ullmann, T.; Becker, M.; Boulesteix, A.-L.; et al. Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2023, 13, e1484. [Google Scholar] [CrossRef]
Agrawal, P. A survey on hyperparameter optimization of machine learning models. In Proceedings of the 2024 2nd International Conference on Disruptive Technologies (ICDT), Greater Noida, India, 15–16 March 2024. [Google Scholar]
Hanifi, S.; Cammarono, A.; Zare-Behtash, H. Advanced hyperparameter optimization of deep learning models for wind power prediction. Renew. Energy 2024, 221, 119700. [Google Scholar] [CrossRef]
Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; Cox, D.D. Hyperopt: A python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015, 8, 014008. [Google Scholar] [CrossRef]
Liao, L.; Li, H.; Shang, W.; Ma, L. An empirical study of the impact of hyperparameter tuning and model optimization on the performance properties of deep neural networks. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2022, 31, 53. [Google Scholar] [CrossRef]
Kapoor, S.; Cantrell, E.M.; Peng, K.; Pham, T.H.; Bail, C.A.; Gundersen, O.E.; Hofman, J.M.; Hullman, J.; Lones, M.A.; Malik, M.M.; et al. REFORMS: Consensus-based recommendations for machine-learning-based science. Sci. Adv. 2024, 10, eadk3452. [Google Scholar] [CrossRef]
Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic enhancement strategies when using Bayesian optimization for hyperparameter tuning of deep neural networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
Vincent, A.M.; Jidesh, P. An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithms. Sci. Rep. 2023, 13, 4737. [Google Scholar] [CrossRef]
Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 2011, 24. [Google Scholar]
Joy, T.T.; Rana, S.; Gupta, S.; Venkatesh, S. Fast hyperparameter tuning using Bayesian optimization with directional derivatives. Knowl.-Based Syst. 2020, 205, 106247. [Google Scholar] [CrossRef]
Hutter, F.; Hoos, H.H.; Leyton-Brown, K. Sequential model-based optimization for general algorithm configuration. In Proceedings of the International Conference on Learning and Intelligent Optimization, Rome, Italy, 17–21 January 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 507–523. [Google Scholar]
Onorato, G. Bayesian optimization for hyperparameters tuning in neural networks. arXiv 2024, arXiv:2410.21886. [Google Scholar] [CrossRef]
Vangelatos, Z.; Sheikh, H.M.; Marcus, P.S.; Grigoropoulos, C.P.; Lopez, V.Z.; Flamourakis, G.; Farsari, M. Strength through defects: A novel Bayesian approach for the optimization of architected materials. Sci. Adv. 2021, 7, eabk2218. [Google Scholar] [CrossRef]
Pyzer-Knapp, E.O.; Chen, L.; Day, G.M.; Cooper, A.I. Accelerating computational discovery of porous solids through improved navigation of energy-structure-function maps. Sci. Adv. 2021, 7, eabi4763. [Google Scholar] [CrossRef]
Zhang, K.; Liu, S.; Bai, L.; Cao, Y.; Yan, Z. Effects of underground mining on soil–vegetation system: A case study of different subsidence areas. Ecosyst. Health Sustain. 2023, 9, 0122. [Google Scholar] [CrossRef]
Hoyt, A.M.; Chaussard, E.; Seppalainen, S.S.; Harvey, C.F. Widespread subsidence and carbon emissions across Southeast Asian peatlands. Nat. Geosci. 2020, 13, 435–440. [Google Scholar] [CrossRef]
Yao, S.; He, Y.; Zhang, L.; Yang, W.; Chen, Y.; Sun, Q.; Zhao, Z.; Cao, S. A convLSTM neural network model for spatiotemporal prediction of mining area surface deformation based on SBAS-InSAR monitoring data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5201722. [Google Scholar] [CrossRef]
Yao, W.; Azadnejad, S.; Huang, B.; Donohue, S.; Dev, S. A Deep Learning Approach for Spatio-Temporal Forecasting of InSAR Ground Deformation in Eastern Ireland. arXiv 2025, arXiv:2509.18176. [Google Scholar]
Li, S.; Zhao, Y.; Varma, R.; Salpekar, O.; Noordhuis, P.; Li, T.; Paszke, A.; Smith, J.; Vaughan, B.; Damania, P.; et al. Pytorch distributed: Experiences on accelerating data parallel training. arXiv 2020, arXiv:2006.15704. [Google Scholar] [CrossRef]
Feurer, M.; Hutter, F. Hyperparameter optimization. In Automated Machine Learning: Methods, Systems, Challenges; Springer International Publishing: Cham, Switzerland, 2019; pp. 3–33. [Google Scholar]
Derrick, B.; Toher, D.; White, P. Why Welch’s test is Type I error robust. Quant. Methods Psychol. 2016, 12, 30–38. [Google Scholar] [CrossRef]
Yekutieli, D.; Benjamini, Y. Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Stat. Plan. Inference 1999, 82, 171–196. [Google Scholar] [CrossRef]
Stephens, M. False discovery rates: A new deal. Biostatistics 2017, 18, 275–294. [Google Scholar] [CrossRef]
Kayhan, O.S.; van Gemert, J.C. On translation invariance in CNNs: Convolutional layers can exploit absolute spatial location. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 14274–14285. [Google Scholar]
Huang, W.; Rong, Y.; Xu, T.; Sun, F.C.; Huang, J.Z. Tackling over-smoothing for general graph convolutional networks. arXiv 2020, arXiv:2008.09864. [Google Scholar]
Yun, J. Mitigating gradient overlap in deep residual networks with gradient normalization for improved non-convex optimization. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; IEEE: New York, NY, USA, 2024; pp. 3831–3837. [Google Scholar]
Shu, C.; Meng, Z.; Yang, Y.; Wang, Y.; Liu, S.; Zhang, X.; Zhang, Y. Deep learning-based InSAR time-series deformation prediction in coal mine areas. Geo-Spat. Inf. Sci. 2025, 28, 2119–2141. [Google Scholar] [CrossRef]
Ferretti, A.; Prati, C.; Rocca, F. Permanent scatterers in SAR interferometry. IEEE Trans. Geosci. Remote Sens. 2002, 39, 8–20. [Google Scholar] [CrossRef]
Yu, C.; Li, Z.; Penna, N.T.; Crippa, P. Generic atmospheric correction model for interferometric synthetic aperture radar observations. J. Geophys. Res. Solid Earth 2018, 123, 9202–9222. [Google Scholar] [CrossRef]
Kirui, P.; Riedel, B.; Gerke, M. Performance of numerical weather products for InSAR tropospheric correction: A case study of a tropical region. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2022, 3, 115–122. [Google Scholar] [CrossRef]
Rubanova, Y.; Chen, R.T.; Duvenaud, D.K. Latent ordinary differential equations for irregularly-sampled time series. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
Ovadia, Y.; Fertig, E.; Ren, J.; Nado, Z.; Sculley, D.; Nowozin, S.; Dillon, J.; Lakshminarayanan, B.; Snoek, J. Can you trust your model’s uncertainty? Evaluating predictive uncertainty under dataset shift. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]

Figure 1. Study area overview. (a) Study area. (b) Ground motion information within the region. The blue dashed box in (a) represents the study area, and (b) is a magnified view of the study area, where deformation data were obtained from EGMS. The base diagrams for (a,b) are digital elevation models.

Figure 2. Spatiotemporal prediction framework based on Bayesian optimization. The data preprocessing in the figure means converting sparsely distributed high-coherence point data into image data. Spatial encoding means using CNN or GCN to extract spatial features of displacement. Spatiotemporal modeling means using LSTM or Transformer to implement time series modeling of the spatial features extracted in the previous step. TPE optimization implements hyperparameter selection for different models in this process.

Figure 3. Comparison of model performance using TPE optimization and manual parameter tuning. (a–c) The performance of all models using

R M S E

,

R^{2}

, and

M A E

, respectively. (d) Performance improvement after TPE-based hyperparameter optimization compared with manual tuning. CL, CT, GL, GT, RL, and RT denote CNN + LSTM; CNN + Transformer; GCN + LSTM; GCN + Transformer; ResNet + LSTM; and ResNet + Transformer, respectively. In the diagram (a–c), different colored bands represent the positions of different architectures on the X-axis, and statistical indicators on the same X-axis are placed in the same colored band box.

Figure 3. Comparison of model performance using TPE optimization and manual parameter tuning. (a–c) The performance of all models using

R M S E

,

R^{2}

, and

M A E

, respectively. (d) Performance improvement after TPE-based hyperparameter optimization compared with manual tuning. CL, CT, GL, GT, RL, and RT denote CNN + LSTM; CNN + Transformer; GCN + LSTM; GCN + Transformer; ResNet + LSTM; and ResNet + Transformer, respectively. In the diagram (a–c), different colored bands represent the positions of different architectures on the X-axis, and statistical indicators on the same X-axis are placed in the same colored band box.

Figure 4. Average performance improvement of TPE over manual tuning. Each row corresponds to an architecture: the first column lists the architecture; the second shows the mean difference and its 95% confidence interval (CI) (

R M S E

/

M A E

: Manual–TPE;

R^{2}

: TPE–Manual); the third shows horizontal error bars with the zero reference as a dashed line; and the fourth reports the FDR-corrected p-value and significance marker (red dot), as well as the conservative lower bound (LB, 95%). All

p_adj

values are significant and no CI crosses zero, indicating consistent and robust gains from TPE across architectures and metrics.

Figure 4. Average performance improvement of TPE over manual tuning. Each row corresponds to an architecture: the first column lists the architecture; the second shows the mean difference and its 95% confidence interval (CI) (

R M S E

/

M A E

: Manual–TPE;

R^{2}

: TPE–Manual); the third shows horizontal error bars with the zero reference as a dashed line; and the fourth reports the FDR-corrected p-value and significance marker (red dot), as well as the conservative lower bound (LB, 95%). All

p_adj

values are significant and no CI crosses zero, indicating consistent and robust gains from TPE across architectures and metrics.

Figure 5. Validation loss curve distribution during TPE optimization process. (a,b) Comparison of the optimisation process for fixed-time encoding modules. (c–e) Optimisation process during fixed-space encoding. Dashed lines denote 10 TPE optimisation iterations, while solid lines represent the average effective loss across 10 optimisation runs. The meanings of CL, CT, GL, GT, RL, and RT correspond to the abbreviations in Figure 3. Grey and red dashed lines serve as reference lines. H1, H2, and V1–V4 are auxiliary lines illustrating the density of loss curves across different architectures. The green auxiliary lines in (c–e) depict the iteration speeds of each architecture.

Figure 6. Schematic diagram of the parameter space of different architectures. (a) Transformer-based architectures exhibit a broad plateau, as shown by the arrow, and a narrow optimal core close to the plateau. (b) LSTM-based architectures have a relatively flat landscape containing multiple suboptimal regions.

Figure 7. Deformation susceptibility and reliability based on ensemble prediction. (a) Susceptibility map derived from the ensemble median

| m (x) |

with global thresholds

Q_{85}

,

Q_{95}

, and

Q_{99}

. (b) Intra-class confidence score for the assigned susceptibility level in (a), defined as the probability mass of that class derived from exceedance probabilities. (c) Ensemble predictive uncertainty of displacement magnitude. (d) Ensemble rank variance of susceptibility class assignment.

Figure 7. Deformation susceptibility and reliability based on ensemble prediction. (a) Susceptibility map derived from the ensemble median

| m (x) |

with global thresholds

Q_{85}

,

Q_{95}

, and

Q_{99}

. (b) Intra-class confidence score for the assigned susceptibility level in (a), defined as the probability mass of that class derived from exceedance probabilities. (c) Ensemble predictive uncertainty of displacement magnitude. (d) Ensemble rank variance of susceptibility class assignment.

Figure 8. Bivariate mapping of deformation susceptibility levels and uncertainty. The susceptibility levels are consistent with those in Figure 7a, divided into Low, Medium, and High. Uncertainty is determined by dividing Figure 7c into Low, Medium, and High Uncertainty. In the figure, the vertical direction of the different colored squares represents the susceptibility level, with the level increasing from bottom to top. The horizontal direction of the squares represents the uncertainty level, with the level increasing from left to right.

Figure 9. Spatial residual diagnostics for the TPE-ResNet+Transformer model. (a) Spatial distribution of signed residuals (red: overestimation; blue: underestimation). (b) Residual Q–Q plot with the theoretical line, OLS fit, and robust fit; dashed gray lines mark symmetric quantiles (left tail: Q1/Q2.5/Q5; right tail: Q95/Q97.5/Q99), and hollow symbols highlight extreme tails (≥Q99 or ≤Q1).

Figure 10. Residual distributions of models with TPE optimization versus manual tuning. (a–c, g–i) The residual distributions after manual parameter tuning. (d–f, j–l) The residual distributions after TPE optimization. The area enclosed by the red dashed circle/rectangle and the black rectangle represents the regions where the spatial residuals decreased before and after optimization. The thicker red dashed line indicates the trend of the fault. The CNN + LSTM, CNN + Transformer, GCN + LSTM, GCN + Transformer, ResNet + LSTM, and ResNet + Transformer models are abbreviated as CNN + L, CNN + T, GCN + L, GCN + T, ResNet + L, and ResNet + T, respectively. The abbreviations are prefixed with M, such as M-CNN + L, denoting a manually tuned model, and prefixed with TPE, such as TPE-CNN + L, representing a TPE-optimized model.

Figure 11. Spatial distribution of exceedance probabilities for three deformation severity thresholds. Panels (a–c) show the exceedance probabilities

P_{85} (x)

,

P_{95} (x)

, and

P_{99} (x)

, defined as the fraction of ensemble members whose predicted displacement magnitudes

| d_{k} (x) |

exceed the global 85th, 95th, and 99th percentile thresholds derived from

| m (x) |

. Higher values indicate a greater likelihood that a pixel reaches at least the corresponding deformation intensity level across the ensemble.

Figure 11. Spatial distribution of exceedance probabilities for three deformation severity thresholds. Panels (a–c) show the exceedance probabilities

P_{85} (x)

,

P_{95} (x)

, and

P_{99} (x)

, defined as the fraction of ensemble members whose predicted displacement magnitudes

| d_{k} (x) |

exceed the global 85th, 95th, and 99th percentile thresholds derived from

| m (x) |

. Higher values indicate a greater likelihood that a pixel reaches at least the corresponding deformation intensity level across the ensemble.

Table 1. The mean values of evaluation metrics.

Model	Manual			TPE
Model	$RMSE$	$MAE$	$R^{2}$	$RMSE$	$MAE$	$R^{2}$
CNN + LSTM	3.091	1.991	0.886	2.887	1.834	0.904
CNN + Transformer	2.851	1.861	0.903	2.717	1.730	0.915
GCN + LSTM	3.375	2.172	0.864	3.170	1.970	0.885
GCN + Transformer	3.382	2.208	0.864	2.853	1.754	0.906
ResNet + LSTM	2.885	1.832	0.901	2.750	1.728	0.913
ResNet + Transformer	2.662	1.704	0.916	2.464	1.571	0.930

Note: The values in the table are the average evaluation index of each architecture across 10 experiments.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.