1. Introduction
Surface deformation monitoring is vital for disaster prevention and safety assessment in mining and urban environments. Over the past decades, Interferometric Synthetic Aperture Radar (InSAR) has emerged as a primary technique for deformation monitoring due to its wide spatial coverage and high measurement precision [
1,
2]. InSAR-derived deformation time series not only support retrospective analysis but also provide a fundamental basis for forecasting future surface motion, which is crucial for early warning and risk mitigation. In mining areas, surface deformation is typically continuous and dynamic, and time-series analysis enables researchers to track deformation trends, detect abnormal variations, and identify potential subsidence-related hazards in a timely manner [
3].
Classical statistical and physical models, such as least-squares inversion methods [
4] and finite element methods [
5], typically rely on restrictive assumptions and fixed governing equations to simulate deformation. These models can perform well when sufficient data are available but often become unstable on small-sample or incomplete datasets due to unreliable parameter estimation and potential overfitting [
6]. With the rapid advancement of deep learning, approaches including Recurrent Neural Networks (RNNs) [
7], Long Short-Term Memory networks (LSTMs) [
8], Gated Recurrent Units (GRUs) [
9], and Convolutional Neural Networks (CNNs) [
10] have been widely applied to surface deformation prediction and generally outperform traditional models. Recent studies further extend these models to more complex settings, including Transformer-based frameworks for long-term deformation prediction [
11], hybrid CNN–RNN architectures with attention mechanisms for step-like or large-scale displacement modeling [
12,
13,
14], and spatio-temporal networks for landslide deformation prediction and early warning [
15,
16]. In mining scenarios, SBAS-InSAR combined with deep sequence models has also demonstrated promising subsidence forecasting performance [
17,
18]. Nevertheless, most deep learning models still require sufficiently dense and informative time series, and they may suffer from unstable training and limited interpretability under sparse observations and small-sample conditions.
Although InSAR is effective for monitoring mining-induced deformation, temporal sampling density and the availability of high-quality deformation time series remain limited in practice. For instance, Sentinel-1 acquisitions are constrained by revisit intervals and may be further affected by decorrelation and data gaps, leading to sparsely sampled deformation sequences in many monitoring scenarios [
19]. As a consequence, the observed time series often contain missing observations and uneven temporal spacing, which increases uncertainty and constrains data-driven forecasting models [
20,
21]. Deep learning models generally require sufficient, temporally continuous, and information-rich time-series inputs to ensure stable training and reliable generalization. Under small-sample settings, predictive accuracy and robustness can be significantly degraded due to inadequate temporal information and weakened representation learning [
22,
23].
To mitigate data scarcity, generative approaches have been introduced to augment time-series datasets [
24]. Time-series Generative Adversarial Networks (TimeGANs) have demonstrated strong capability in synthesizing high-fidelity temporal samples while preserving key dynamic characteristics. TimeGAN-based augmentation has been explored in various small-sample forecasting and diagnostic tasks, including fault diagnosis [
25], degradation-sequence generation [
26], photovoltaic power forecasting [
27], and infrastructure strain prediction [
28], as well as financial time-series prediction [
29]. These studies suggest that generative augmentation can alleviate the adverse impact of limited observations on model training. However, data augmentation alone does not guarantee reliable and physically plausible deformation forecasting, because black-box predictors may still produce unrealistic oscillations or trend reversals. Moreover, the systematic use of TimeGAN for mining-area surface deformation forecasting remains limited, especially when strong nonlinearity must be reconciled with interpretability requirements.
Therefore, beyond improving data representativeness, it is crucial to develop a forecasting model that is both physically plausible and interpretable for safety-critical deformation prediction. Kolmogorov–Arnold Networks (KANs) have recently emerged as an interpretable alternative to multi-layer perceptron (MLP)-based architectures due to their spline-based functional representations, enabling transparent nonlinear modeling [
30]. Representative applications include renewable-energy and meteorological forecasting, such as solar radiation and temperature prediction [
31], electricity demand forecasting in power systems [
32], and financial time-series analysis where interpretability is emphasized [
33]. In addition, for general multivariate time-series prediction problems, empirical evidence has further confirmed the feasibility of KAN in capturing complex temporal dependencies [
34]. These advances provide a solid foundation for applying KAN to time-series forecasting; however, in mining-induced deformation scenarios governed by explicit physical constraints, the stability and physical consistency of KAN-based models remain to be systematically investigated.
In this study, we propose a generation–prediction–interpretation framework for mining-induced deformation forecasting under sparse and small-sample InSAR observations. The framework integrates a TCN-enhanced TimeGAN for deformation-sequence augmentation and a physics-informed KAN-based predictor for physically plausible and interpretable forecasting. Experiments on InSAR time series from the mining area demonstrate that the proposed approach improves prediction accuracy and stability while also enabling transparent analysis of temporal contributions.
3. Experiment and Results
3.1. InSAR Monitoring Results
This section first validates the accuracy of the SBAS-InSAR deformation estimates using leveling observations. On this basis, the spatial distribution and temporal evolution of mining-induced deformation over the study area are then analyzed. Unless otherwise stated, negative values indicate subsidence, and the deformation rate is estimated from the SBAS displacement time series.
Six leveling benchmarks were established across the study area. Points A to C were located within the working face, point D was placed near the water body, and points E and F followed the longitudinal profile. Third-order leveling was carried out from 10 July 2021 to 18 February 2023 to assess the SBAS-InSAR deformation estimates. For each benchmark, the InSAR displacement was sampled from the nearest pixel and referenced to the same baseline date. The comparison results are summarized in
Table 2.
Table 2 shows that the absolute differences between SBAS-InSAR and leveling range from 3.71 to 7.12 mm, with a mean absolute error of 5.23 mm and a root mean square error of 5.45 mm. These results indicate good agreement between the two datasets and confirm that the InSAR-derived deformation estimates are sufficiently reliable for the subsequent deformation analysis. After confirming the reliability of the SBAS-InSAR results, we next analyze the spatial pattern and temporal evolution of surface deformation in the study area.
Figure 5 presents the annual mean deformation rate. A pronounced subsidence bowl is evident above the goaf of the 1611 (1) working face. The rate is derived from the SBAS displacement time series and reported in the line-of-sight direction. Negative values represent subsidence, while near-zero to positive values are shown in separate classes in the revised map for clearer physical interpretation. The strongest subsidence is confined to the 1611 (1) goaf, where the peak rate reaches −438.7 mm per year.
Moderate subsidence is mainly distributed in the southern sector of the 1613 (3) working face. This pattern is likely associated with the superposed influence of the retreat of 1611 (1). Elsewhere within 1613 (3), deformation remains minor, indicating generally stable surface conditions during the observation period. The corresponding cumulative deformation is shown in
Figure 6.
Figure 6 shows that the maximum cumulative deformation occurs in the zone between the 1613 (3) and 1611 (1) working faces, where a clear subsidence bowl has developed. Over the study period, the cumulative line of sight displacement spans from −613.9 mm to 69.6 mm. Positive values are mainly found near the margins and in areas with weak deformation. They do not represent the same physical process as the negative subsidence signal, but mainly reflect relative motion with respect to the chosen reference area and local residual variability. To better track the temporal evolution, we provide nine cumulative deformation maps at roughly 2–3-month intervals in
Figure 7. All maps are referenced to the baseline date of 10 July 2021.
Figure 7 depicts the spatiotemporal evolution of cumulative deformation at roughly three-month intervals, with all maps referenced to 10 July 2021. Between August and October 2021, deformation is weak and shows no coherent spatial pattern. A localized subsidence center emerges near the working face in early 2022. From March to May 2022, the subsidence bowl intensifies and expands along the strike of the working face.
After mid-2022, the spatial footprint of the bowl changes little, whereas the cumulative displacement continues to grow. The maximum deformation is reached on 18 February 2023. In general, deformation clusters around the goaf and its surrounding area, supporting a mining-related origin.
The cumulative deformation time series for all six benchmarks are presented in
Figure 8.
Figure 8 further reveals clear spatial differences: Point F shows the greatest subsidence, followed by Point D, while Point A is moderate and Points B, E, and C exhibit smaller deformations. These variations reflect the spatial heterogeneity caused by mining activities and indicate that the subsidence at these points is representative of the study area’s overall deformation pattern. This provides a reliable data basis for the subsequent experiments and modeling.
3.2. Effectiveness of the Generative Model
Each InSAR monitoring point provides only 49 observations, which is limited for training deep sequence models and increases the risk of overfitting. To mitigate this issue, we apply the proposed TCN-TimeGAN to augment the training data at each point. For each site, the generator produces approximately four times as many synthetic training sequences as real ones, and the downstream predictor is trained on a mixed dataset consisting of real and generated sequences with an approximate ratio of 1:4. This augmentation ratio was adopted to substantially increase training diversity under the small-sample setting while avoiding excessive reliance on synthetic data, which may introduce distributional bias. The generator is trained on the training split only and outputs synthetic sequences with the same window length as the real samples. We first assessed generation quality using t-distributed Stochastic Neighbor Embedding (t-SNE) visualization [
45].
Hyperparameters were selected to balance capacity and stability under small-sample InSAR series (49 observations per site). We kept a compact model (hidden_dim = 8 with shallow stacks) and a moderate window length (16) to preserve temporal context while maintaining enough training windows. Larger hidden dimensions were also tested during pilot runs but tended to overfit due to the limited number of training samples. The noise dimension (noise_dim = 16) controls the diversity of the latent noise vector used to generate synthetic sequences; a moderate value was adopted to provide sufficient variability while maintaining stable adversarial training. The learning rate (0.005), batch size (16), and number of attention heads (4) were selected based on pilot experiments using the same generation-quality criteria, with the aim of achieving stable convergence and high-fidelity sequence synthesis. The specific network hyperparameter configurations are detailed in
Table 3.
To assess the contribution of the key architectural components in TCN-TimeGAN, we conduct a staged ablation study on the temporal modeling backbone used in both the generator and discriminator. Starting from an RNN-based backbone as the baseline, we replace it with a TCN to enlarge the temporal receptive field, and then add a self-attention layer on top of the TCN (SA-TCN). All other settings follow
Table 3, so the observed differences can be attributed to the backbone design.
Figure 9 shows the loss convergence trajectories of all variants during training.
As shown in
Figure 9, introducing TCN and SA-TCN leads to faster and more stable loss reduction than the RNN baseline, and the final steady level is also lower. The recurrent baseline exhibits pronounced oscillations in both generator and discriminator losses, together with slower convergence, suggesting less stable optimization when modeling long-range dependencies under adversarial learning. After replacing the backbone with TCN, the oscillation amplitude decreases markedly and the convergence rate improves, which is consistent with the benefit of an enlarged temporal receptive field. When self-attention is further added on top of TCN, the loss decreases again and oscillations are suppressed more effectively, indicating additional gains from global dependency modeling beyond convolutional context aggregation.
With the improved adversarial training stability validated above, we next evaluate the fidelity and distribution alignment of the generated sequences using t-SNE visualization. To assess the effectiveness of the proposed design, we compare it with standard TimeGAN as a baseline. Feature visualizations contrasting the augmented and original sequences at each monitoring site are shown in
Figure 10 and
Figure 11. For visualization clarity, the t-SNE plots are generated using randomly sampled subsets of both real and synthetic sequences rather than the entire augmented dataset. This sampling strategy keeps the point densities comparable and improves visual interpretability.
Figure 10 shows the t-SNE projections of the synthetic sequences generated by the traditional TimeGAN alongside the original sequences at six monitoring sites. The traditional model reproduces the general distributional patterns of the real data; however, noticeable discrepancies persist at several sites, including distributional shifts, local cluster misalignment, and insufficient fitting near the boundaries. These issues indicate that the traditional TimeGAN has limited capability in capturing complex deformation patterns.
In contrast, the improved TimeGAN demonstrates markedly more stable behavior across all monitoring sites, as illustrated in
Figure 11. The synthetic samples closely match the original data in terms of local geometric structures, overall trajectory trends, and boundary contours. The overlap in the feature space is markedly improved, demonstrating that the generated samples effectively capture the intrinsic manifold of the real deformation series.
We additionally employed three quantitative fidelity metrics to rigorously evaluate generation quality: (1) Maximum Mean Discrepancy (MMD) to measure distributional similarity between real and generated data [
46]; (2) Dynamic Time Warping (DTW) to quantify temporal alignment with the nearest real sequence [
47]; and (3) Coverage to evaluate how well the synthetic data span the real distribution [
48]. The quantitative results of these metrics are reported in
Table 4.
Table 4 shows that TCN-TimeGAN outperforms the conventional TimeGAN at all six monitoring points. It yields lower MMD and DTW and higher Coverage throughout. On average, MMD drops by about 68 percent, indicating a closer match between the synthetic samples and the real data distribution. The mean DTW decreases from 0.48 to 0.29, reflecting better alignment of temporal patterns. Coverage also improves, rising from 0.78 to 0.91 on average, which suggests broader support over the real data space. Taken together, these gains point to synthetic sequences that are both more realistic and more diverse than those produced by the baseline. This provides a stronger augmented dataset for the subsequent prediction experiments.
When the generation quality is evaluated only by t-SNE visualization together with statistical metrics such as MMD, DTW, and Coverage, synthetic sequences may appear statistically similar to real data while still failing to reproduce the underlying temporal dynamics. To further examine whether the sequences generated by TCN-TimeGAN preserve the essential dynamic characteristics of the real subsidence process, an additional diagnostic analysis based on sliding windows with a length of 16 is conducted, as illustrated in
Figure 12. Real and synthetic samples are compared from four aspects, including rate magnitude, smoothness behavior, stage structure, and short-term autocorrelation. All synthetic sequences are generated using models trained only on the training portion of the data.
Figure 12 compares the dynamic behavior of the real subsidence sequences with that of the TCN-TimeGAN-generated samples from four perspectives, including rate magnitude, smoothness, stage evolution, and short-term temporal dependence. In
Figure 12a,b, the synthetic sequences show distributions of peak subsidence rate and smoothness that remain close to those of the real data, with small W1 and KS distances and low violation rates. This suggests that the generated samples preserve the main intensity and regularity of short-term deformation without introducing obvious unrealistic fluctuations.
Figure 12c,d further examine whether the temporal evolution is maintained. The estimated changepoint locations and peak-rate timing exhibit similar error distributions between real and synthetic sequences, indicating that the principal stage structure of the subsidence process is broadly retained. In addition, the autocorrelation curves of the rate series remain close, and both ACF distance and DTW statistics stay small, supporting the consistency of short-term temporal dynamics. Taken together, these results suggest that TCN-TimeGAN does not merely reproduce the overall distribution of the training data, but also captures the main dynamic characteristics of the subsidence process.
3.3. Parameter Efficiency and Robustness Analysis of KAN
We examine how prediction performance changes with model capacity and assess the robustness of KAN against conventional neural architectures. In standard multilayer perceptrons, nonlinear mapping is realized mainly through stacked fully connected layers, which can lead to parameter redundancy. KAN adopts a different parameterization. It models nonlinearity with learnable univariate functions placed on edges and aggregates their contributions by summation. This structure improves interpretability and can reduce redundant degrees of freedom.
In the KAN model, grid size is a crucial parameter that represents the number of segments for each input feature in the spline function. This parameter directly impacts the model’s expressive power and computational efficiency. Specifically, a larger grid size results in a finer spline function, which can capture more intricate patterns and nonlinear relationships in input features. However, an excessively large grid size may lead to overfitting, increased computational costs, and slower training speeds.
Conversely, a smaller grid size produces a smoother and coarser spline function, speeding up computation but potentially failing to learn detailed features in the time series data, resulting in underfitting. To determine the optimal grid size, we conducted experiments with grid sizes of 1, 5, 10, 15, 20, and 25. The evaluation was based on test loss and R
2 scores to identify the optimal grid size. We selected the TCN-TimeGAN augmented dataset at point A as the experimental dataset. After normalizing the data, 80% of the dataset was used as the training set, 10% as the validation set, and the remaining 10% as the test set. The detailed settings of the model’s parameters are listed in
Table 5.
And the experimental results are shown in
Figure 13.
As shown in
Figure 13, the test loss first decreases and then increases as the grid size increases. When the grid size is 1 or 5, the model fails to effectively learn the temporal features of the data, resulting in a relatively high test loss. At a grid size of 10, the model achieves the lowest test loss. For grid sizes of 15, 20, and 25, although the model’s fitting capability improves, overfitting occurs, leading to a decline in generalization performance and an increase in test loss. The corresponding R
2 values show a similar trend, reaching near-maximum values at a grid size of 15, indicating that the model best fits the overall trend at this grid size. Therefore, in subsequent experiments, the grid size was set to 10.
In our setting, capacity is largely controlled by network width, measured by the number of neurons per layer. We therefore test whether KAN preserves accuracy when the width is constrained. This provides an empirical view of its robustness under limited capacity. BiGRU is widely used for time-series prediction [
49,
50,
51]. This experiment is designed to address two questions. First, we test whether KAN achieves higher accuracy than BiGRU under the same protocol. Second, we examine how KAN performs when model capacity is constrained by a small width.
This split is used only for the capacity-sensitivity analysis. We use the TCN-TimeGAN augmented datasets from monitoring points B and E as representative cases. After normalization, each dataset is split chronologically into 80 percent for training, 10 percent for validation, and 10 percent for testing to prevent temporal leakage. The same split and evaluation procedure are used for both models to ensure a fair comparison.
For KAN, the core configuration is kept fixed across all runs. For the BiGRU baseline, we set the dropout rate to 0.3 and keep the remaining settings consistent with the protocol. We vary the neuron count at 8, 16, 32, and 64. Here, neuron count denotes the layer width in KAN and the hidden size in BiGRU. Performance is evaluated using RMSE and R
2. The results are summarized in
Figure 14.
Figure 14 demonstrates a consistent advantage of KAN over BiGRU across all tested capacities, with the largest gains at the smallest neuron counts. At point B with 8 neurons, KAN attains an RMSE of 0.986 mm and an R
2 of 0.952, whereas BiGRU yields 1.325 mm and 0.762. At point E under the same setting, KAN reports 1.122 mm and 0.922, compared with 1.526 mm and 0.723 for BiGRU. These gaps correspond to RMSE increases of 34.4% and 36.0% for BiGRU at points B and E, respectively. Collectively, the results indicate that BiGRU is more sensitive to capacity reduction, while KAN preserves accuracy more effectively in the low-capacity regime.
At 64 neurons, BiGRU largely closes the gap to KAN. RMSE is 0.953 mm for KAN and 1.021 mm for BiGRU at point B, and 0.992 mm versus 1.117 mm at point E. The remaining differences are 6.7% and 11.2%, respectively. Capacity therefore benefits BiGRU more strongly. Yet KAN still leads under the same neuron budget, with lower RMSE and higher R2 on both datasets. This pattern indicates greater width robustness for KAN and suggests that BiGRU requires larger hidden states to match its accuracy. KAN is thus well suited to the small-sample, low-capacity setting considered here.
3.4. Effectiveness Analysis of Surface Deformation Prediction
Following the validation of the generative module and the KAN architecture, Experiment 3.4 evaluates the comprehensive performance of the proposed TGAN-PIKAN framework on real-world deformation forecasting tasks. For this task, we used the final eight periods of surface deformation data (from 26 November 2022, to 18 February 2023) as the test set, with the remaining data as the training set. Unlike the capacity-sensitivity analysis, this fixed hold-out setting is designed to mimic a realistic deployment scenario with a strictly future test segment.
To prevent temporal data leakage, a strict ‘split-then-train’ protocol was adopted. The training and test sets were first separated chronologically, and all normalization parameters were fitted using the training split only and then applied to both splits. The TCN-TimeGAN was trained exclusively on the training split, and the synthetic samples were used only to augment the training data for forecasting; the test split was reserved solely for final evaluation. For each site, we considered two training sets: (i) original and (ii) augmented (generated per site). We conducted predictions on the original training set using KAN, BiGRU, and the widely used CNN-BiGRU model. On the augmented set, we evaluated KAN, BiGRU, and the proposed PI-KAN. The results for these predictions are denoted as TGAN-KAN, TGAN-BiGRU, and TGAN-PIKAN, respectively. The prediction results for each point are shown in
Figure 15.
As shown in
Figure 15, the BiGRU, CNN-BiGRU, and KAN models trained on the original dataset generally reproduce the overall trends of surface deformation. However, noticeable deviations emerge during periods of rapid subsidence or near inflection points because of the limited number of training samples. Owing to its one-dimensional decomposition and B-spline-based nonlinear approximation, the KAN model shows stronger robustness, whereas CNN-BiGRU enhances local feature extraction during specific stages. Nevertheless, both models remain constrained by the limited sample size when characterizing complex, nonlinear, time-varying deformation features.
With the introduction of generative data augmentation, the overall prediction accuracy improves markedly. Both TGAN-KAN and TGAN-BiGRU outperform their counterparts trained on the original dataset, indicating that the generated samples compensate for deficiencies in the original sequences and enhance the models’ ability to learn more complete deformation patterns. Building on this improvement, incorporating physical constraints into the KAN architecture further enhances the stability of PI-KAN predictions, yielding smoother temporal variations and deformation trends that better reflect the actual deformation process. Experimental results show that, when supported by the augmented data, PI-KAN achieves the best performance in trend fitting and phase-specific deformation responses, while also showing superior agreement with the ground-truth SBAS-InSAR sequences.
In earlier experimental phases, we primarily examined the prediction curves of the models on the test set. To quantify each method’s strengths and weaknesses from multiple perspectives, we gathered various quantitative metrics for each model’s predictions. Specifically, we assessed the models using five metrics: RMSE, MAE, MAPE, R
2, and Runtime. For TGAN-KAN, TGAN-BiGRU, and our proposed method, the computational cost of the TCN-TimeGAN training was explicitly incorporated into the total runtime metric. This approach offers a more comprehensive view of the time and resource requirements in a complete prediction workflow under practical conditions. The detailed quantitative comparisons of the models at each point appear in
Figure 16.
Figure 16 summarizes point-wise prediction performance at the six monitoring sites using five criteria, including error measures, goodness of fit, and runtime. Smaller errors and shorter runtime are preferred, while a higher R
2 indicates a better fit.
Across sites, TGAN-PIKAN provides the strongest overall balance. It reduces errors and improves R
2 at most locations. Relative to KAN trained on the original samples, TGAN-KAN achieves lower errors, suggesting that generative augmentation improves generalization when observations are limited. Adding physics-informed regularization in TGAN-PIKAN brings a further gain and yields more stable site-level predictions. By comparison, BiGRU produces larger errors at several sites, and CNN-BiGRU is the most time-consuming. The average quantitative metrics for the overall experimental setup are summarized in
Table 6.
As shown in
Table 6, both TGAN-KAN and TGAN-BiGRU yield substantially lower average errors than the original KAN and BiGRU models without data augmentation. For instance, the RMSE of TGAN-KAN decreases to 1.317, and its MAPE drops from 0.741 (KAN) to 0.523. Its R
2 increases to 0.920, with TGAN-BiGRU achieving 0.875—both markedly higher than the baseline values. These results suggest that generative augmentation alleviates small-sample limitations. Building on these improvements, the proposed PI-KAN model delivers the best performance across all metrics. Although the training phase requires more time than original models, the added computational cost is justified by the significant gains in accuracy and reliability. Overall, the method shows clear advantages for small-sample time-series prediction.
3.5. Model Interpretability Analysis
KAN improves interpretability by parameterizing nonlinear transformations with learnable univariate functions defined on network edges. In our forecasting setup, each lagged deformation value in the input window is treated as a separate input dimension. Its contribution is transmitted to the hidden units through edge functions. Each edge function is represented by B-spline basis functions defined on a fixed knot grid.
This experiment uses an input window length of 16, matching Experiment 3.4. For monitoring point A, we visualize the first-layer edge functions learned by PI-KAN. With a hidden width of 10, the first layer contains 160 univariate edge functions, corresponding to the 16 by 10 lag to hidden connections.
Figure 17 arranges these functions as a grid. The horizontal axis shows the normalized value of the relevant lagged input, and the vertical axis shows the output of the learned edge function. This view makes the nonlinear transformation applied to each lag explicit before the network aggregates the contributions.
Figure 17 offers an interpretable view of the spline mappings learned by PI-KAN across the input window. Each panel corresponds to one lagged time step and one spline channel. The curve shape and magnitude indicate how that temporal component is transformed before it contributes to the final deformation estimate. Large, structured responses imply a stronger nonlinear contribution, whereas nearly flat curves suggest a limited marginal effect.
The internal mappings are consistent with the physics-informed design of PI-KAN. The subsidence-oriented constraint discourages non-physical uplift responses, and the smoothness term reduces sharp curvature and high-frequency oscillations. Together, these regularizers promote stable mappings that remain physically plausible.
The enlarged panels at the bottom provide representative examples. Mappings at earlier time steps are typically smoother and less complex, consistent with an emphasis on longer-term background trends. By contrast, mappings closer to the forecast horizon show richer nonlinear structure, indicating a greater reliance on short-term dynamics when subsidence accelerates or changes regime. Overall,
Figure 17 illustrates how PI-KAN distributes temporal influence while preserving physically consistent trend evolution.
4. Discussion
4.1. Core Findings and Framework Positioning
Our results outline a coherent generation–prediction–interpretation pipeline for sparse InSAR time series. TCN-TimeGAN-augmented training combined with physics-informed PI-KAN yields the most accurate and stable forecasts for sparse InSAR deformation time series, while preserving interpretability through spline-based internal mappings that reveal stronger short-term influence near the forecast horizon.
Recent studies have increasingly adopted attention-based and Transformer-based models for InSAR deformation forecasting to capture long-range temporal dependencies [
52,
53]. Such models are often effective when sequences are long, information-rich, and supplemented with additional covariates.
The setting addressed here is different. We focus on site-level forecasting from short and temporally sparse InSAR time series, with 49 observations per site. In this regime, the empirical training distribution is narrow, and purely data-driven black-box predictors can be sensitive to sampling variability. Our framework counters this limitation in two ways. Generative augmentation broadens the representativeness of the training set. Physics-informed learning then regularizes the predictor through subsidence-consistency and smoothness priors. The accuracy and stability gains in
Table 6 suggest that these inductive biases are especially valuable when data are limited.
Future work will include stronger Transformer baselines evaluated under identical data splits and input information. This will help clarify the trade-off between attention models that typically benefit from richer data and physics-informed, parameter-efficient predictors designed for low-data forecasting.
4.2. Mechanism Analysis of the Physics-Informed Regularization
The effectiveness of the proposed framework can also be understood from the roles played by its different components. In PI-KAN, physics is introduced through soft penalties that discourage non-physical uplift and overly large curvature. This differs from classical physics-informed neural networks, which often enforce governing equations through PDE residuals. For mining subsidence forecasting with sparse line-of-sight observations, soft constraints are a practical choice. Fully specified geomechanical models typically require parameters that are difficult to identify from InSAR alone. In addition, InSAR time series can show small apparent uplifts caused by reference-point selection and residual atmospheric effects. Monotonicity and smoothness should therefore be viewed as stabilizing priors, not as strict physical laws.
These priors can be unsuitable when genuine rebound is present or when observation biases dominate. Examples include groundwater recovery and backfilling operations. In such cases, stage-aware or piecewise constraints that allow different regimes are preferable. Another promising direction is to incorporate explicit mechanical priors, including Probability Integral Method formulations and constitutive relations, when auxiliary mining and geotechnical information is available.
In this study, we monitor augmentation quality with MMD, DTW, and Coverage in
Table 4. These criteria are useful, but they do not guarantee dynamic realism. For deployment, we recommend a more defensive workflow. All augmentation and regularization thresholds should be set using the training data only. Synthetic sequences should be screened to remove outliers, using DTW to the nearest real sequence or discriminator-based plausibility scores. Performance should also be stress-tested against the proportion of synthetic data used for training. These checks reduce the risk of unrealistic transitions, especially near segment boundaries and during rapid deformation.
4.3. Methodological Advantages and Practical Relevance
From a deployment perspective, the runtime in
Table 6 is dominated by training the generative module. This is consistent with the well-known cost of adversarial training. For near-real-time use, the overhead can be reduced by adopting lighter generator designs, exploiting parallel computation, and moving most training to periodic offline updates with incremental refresh. The predictor can then remain lightweight for online inference.
The spline-based KAN structure is also appealing for resource-limited settings. It can retain competitive accuracy at modest widths and provides transparent internal mapping.
In areas where leveling or GNSS ground truth is unavailable, prediction reliability can still be examined through time-split hindcast using held-out InSAR observations. Additional diagnostics can be performed using the same physical priors adopted in PI-KAN, such as monotonic subsidence tendency within a tolerance and smooth deformation evolution. When available, these checks may also be complemented by spatial consistency across neighboring monitoring points or consistency with mining activity records. These aspects will be explored in future multi-area deployments to further improve robustness.
4.4. Limitations and Future Directions
This study was evaluated using data from the Guqiao Coal Mine only; therefore, the generalizability of the proposed framework to other mining districts with different seam depths, overburden structures, and mining methods remains to be verified. However, the six monitoring benchmarks are not confined to a single local setting. They are distributed across the subsidence basin influenced by two adjacent mining panels and exhibit clear differences in deformation magnitude and temporal evolution. This spatial heterogeneity provides a preliminary assessment of the model’s robustness under varying subsidence intensities.
Evidence beyond the Guqiao site is also suggested by our related study in the Banji mining area, where generative-model-based data augmentation was shown to improve small-sample InSAR time-series prediction [
40]. Although that work did not employ the full framework proposed here, it provides additional support for the transferability of the augmentation concept. A systematic cross-mine evaluation of the complete framework, such as training on one mine and testing on another or performing leave-one-mine-out validation, will be pursued when multi-area datasets become available.
In this study, each benchmark provides 49 InSAR observations, which allows a sliding-window strategy to construct sufficient training segments. When the time series becomes extremely sparse, for example when fewer than 20 observations are available, the number of usable windows and the information available to constrain temporal dynamics are substantially reduced. As a result, the performance of both the generative and predictive components may deteriorate. Although generative augmentation improves performance in our experiments, it cannot guarantee improvement in all situations. Time-series GANs may propagate or amplify noise when the seed observations contain significant errors, and the generation process may smooth rare but important deformation behaviors if it is not sufficiently constrained.
In practice, missing acquisitions often lead to irregular temporal sampling. The current implementation treats the input as a sequence indexed by discrete time steps and therefore implicitly assumes regular sampling. When irregular intervals occur, the observations can be mapped onto a regular temporal grid through resampling or interpolation. A more general extension would be to incorporate the time gap between consecutive observations as an additional input feature so that the model can account for varying temporal intervals.
Future work will validate the framework across diverse mining conditions and other deformation processes. It will also investigate conditional generation and forecasting that incorporate external drivers such as mining schedules, groundwater level, and rainfall. Robustness should be assessed under missing observations and varying revisit intervals.