3.1. Analysis of Station-to-Station Interpolation Results
For model performance evaluation, 30 stations are randomly selected as hidden validation points, with their distribution shown in
Figure 1. The interpolation performance of the DG model is compared with that of the KCN model and the OK method. To eliminate the randomness of single experiments, an ensemble test involving 100 random selections of hidden stations was conducted, with the shaded areas in the figures representing the standard deviation range of the multiple experiments. Specifically, data from the first 90% of the time points serve as the training set X to test the performance of the KCN and OK methods on the remaining 10% of time points. The DG model, with an input sequence length of h = 6 (i.e., the previous six days, t-6~t-1), simultaneously outputs the interpolated values for the current day (t0). The newly predicted data are then iteratively fed back as inputs for subsequent forecasts. By leveraging graph neural networks to deeply integrate the temporal dynamics and spatial distribution characteristics of meteorological variables, the model achieves accurate reconstruction and correction of data in unobserved regions.
To visually compare the temporal stability of the model predictions over different time spans and to assess performance under extreme or transitional weather conditions,
Figure 3 presents the time stability of temperature and precipitation forecasts. During May 2024 to January 2025, DG consistently achieves significantly lower root mean square error (RMSE) in temperature predictions than both the OK and KCN models. Its average RMSE is 0.40 °C, far superior to 1.21 °C for OK and 1.84 °C for KCN (
Figure 3a). Examining the trend curves, although all three models experience slight error fluctuations in October due to seasonal climate transitions, DG exhibits noticeably narrower error fluctuations and confidence intervals. This indicates that DG maintains superior interpolation performance compared to KCN and OK, both during the high temperatures of summer and the gradually decreasing temperatures of autumn and winter.
The time series of precipitation RMSE further reveals the performance differences in the models under varying precipitation intensities (
Figure 3b). DG achieves an average RMSE of 1.98 mm/day across the entire period, significantly lower than 3.45 mm/day for OK and 3.22 mm/day for KCN. Notably, during the rainy season from May to September, the errors of OK and KCN increase substantially as precipitation intensifies and then decrease markedly after October. As shown in
Figure S1 (
Supplementary Materials), the error varies across different seasons.
To compare the interpolation performance of the DG model, KCN model, and OK method across different terrains,
Table 1 presents model evaluation results based on terrain subdivisions. It is evident that terrain complexity has a significant differential impact on the simulation accuracy of temperature and precipitation. For temperature, the low-elevation areas exhibit generally small errors for the DG and OK methods due to relatively flat topography, with DG achieving an average RMSE of 0.39 °C, slightly better than 0.40 °C for OK. However, the KCN model performs poorly in this region, with an RMSE as high as 1.32 °C. As elevation increases, temperature fluctuates with altitude, leading to a marked increase in error for OK (rising to 1.11 °C) and consistently high error for the KCN model (1.37 °C). DG, however, substantially reduces estimation bias in high-elevation regions, achieving an RMSE of 0.85 °C, representing a 23.4% improvement over OK and a remarkable 40.0% improvement over the KCN model.
In contrast, for the more spatially heterogeneous precipitation data, DG consistently outperforms both the OK method and the KCN model across different regions. Even in low-elevation areas, DG achieves an average RMSE of 4.71 mm/day, representing a 13.7% reduction compared to 5.46 mm/day for OK and a 16.8% reduction compared to 5.66 mm/day for the KCN model. Particularly in high-elevation regions with dramatic terrain variations, DG effectively overcomes the smoothing effect of traditional geostatistical methods under sparse station distribution, reducing the average RMSE from 7.04 mm/day (for OK) to 5.99 mm/day, a 14.9% decrease. Although the KCN model (6.10 mm/day) performs better than OK in high-elevation precipitation interpolation, it remains less accurate than the DG model. Overall, DG demonstrates stable performance in flat regions and exhibits stronger adaptability and superior predictive accuracy than traditional methods and other deep learning models in complex mountainous terrains.
3.2. Gridded-to-Station Interpolation
Since numerical forecasts and reanalysis datasets are primarily provided as gridded outputs, while operational applications often require station-based variable predictions, this study further evaluates the performance of the DG model, KCN model, and OK method for gridded-to-station interpolation in the Quanzhou area. Specifically, the DG model integrates ERA5 reanalysis data with continuous prior observations at stations (t-6~t-1) to perform spatiotemporal interpolation and modeling of station temperature and precipitation at the prediction time (t0). The experimental window remains from May 2024 to January 2025, and all meteorological stations within the study area are used to evaluate model performance.
Figure 4 illustrates the monthly evolution of RMSE for station interpolation based on gridded data using the DG, KCN, and OK models. For temperature interpolation, the monthly mean RMSE is 0.43 °C for DG, 1.26 °C for OK, and 1.80 °C for KCN, with DG reducing the error by approximately 66% compared to OK and significantly outperforming the KCN model. For precipitation, despite the spatial heterogeneity caused by strong convective precipitation events in summer, DG achieves a monthly mean RMSE of 2.07 mm/day, significantly lower than 3.60 mm/day for OK and 3.34 mm/day for the KCN model. Notably, all three models exhibit peak errors in September, the month with the strongest rainfall during the rainy season, but DG maintains a mean RMSE of 3.78 mm/day, demonstrating the most significant robustness and accuracy improvement compared to the OK and KCN models.
These results indicate that DG clearly outperforms both OK and the KCN model in gridded-to-station interpolation. A possible explanation is that ERA5 gridded data, as reanalysis fields, contain systematic model biases and insufficiently assimilated station observations, which can cause direct OK interpolation to produce errors at local stations. In contrast, DG deeply integrates prior station observations along with ERA5 data, which partially mitigates local biases in the reanalysis and demonstrates the reliability of DG for high-precision gridded-to-station interpolation.
To further compare the performance of the DG model, KCN model, and OK method, temperature and precipitation case studies are selected for detailed analysis: the temperature analysis on 7 July 2024, and the precipitation analysis on 15 June 2024. The predicted results are compared with station observations (OBS) and the corresponding ERA5 gridded background field data for the same periods.
As shown in
Figure 5, for the temperature case on 7 July 2024, DG accurately reproduces the inland-cool and coastal-warm gradient pattern across the Quanzhou area (
Figure 5b), achieving an RMSE of only 0.41 °C. In contrast, although OK captures the main spatial distribution, its RMSE reaches 1.58 °C, and the KCN model also performs unsatisfactorily in this case, with an RMSE of 1.94 °C, failing to effectively correct the biases in the background field. Comparing the differences between the methods and the observations (
Figure 6) reveals that DG exhibits a more uniform error distribution relative to ERA5, indicating that it corrects local station biases while maintaining high physical consistency with the large-scale background field. The largest biases occur mainly in the northeastern part of Quanzhou and some coastal stations, with local deviations reaching up to 11.2 °C.
For the precipitation case on 15 June 2024, DG achieves an RMSE of 11.79 mm/day, whereas OK yields an RMSE of 31.29 mm/day and the KCN model yields an RMSE of 18.90 mm/day. Observations indicate a pronounced heavy rainfall zone along the southeastern coast of Quanzhou, exhibiting significant spatial heterogeneity. However, the OK analysis (
Figure 5g) is overly smoothed and underestimates the overall values, mainly because the ERA5 reanalysis data fail to capture the intense rainfall event in the southeast. Although the KCN model (
Figure 5h) outperforms the OK method to some extent by capturing partial precipitation signals, it fails to accurately reconstruct the rainfall intensity in the high-value zone. In contrast, DG (
Figure 5f) successfully predicts this high-precipitation area, with a spatial distribution closely matching the observations, thereby more accurately reflecting both the intensity and spatial pattern of the localized heavy rainfall event. Differences between the predictions of the methods and the observations (
Figure 6) further highlight the DG model’s advantage in capturing this extreme precipitation, while both the OK method and the KCN model exhibit varying degrees of pronounced error clustering in the southeastern part of Quanzhou, particularly in the heavy rainfall region. Together, the interpolation results from these two cases demonstrate that DG, by leveraging its inherent strengths and incorporating prior station observations, achieves substantially better predictive performance than the OK method and conventional deep learning models for both temperature fields and precipitation fields with localized heavy rainfall.
To quantitatively assess the contribution of spatiotemporal coordination in model to meteorological interpolation accuracy and spatial pattern reconstruction, the interpolation performance of the full DG model is compared with two ablation variants: a model retaining only spatial information (OSI) and a model retaining only temporal information (OTI). OSI focuses on learning the interactions among grid points at each time step by modeling the dependencies between the target location and surrounding stations, while neglecting prior temporal evolution along the time axis. In contrast, OTI removes spatial connections among stations and relies solely on learning the temporal evolution of historical observations at individual stations to predict meteorological variables at the current time step.
Figure 7a–d visually illustrate the super-resolution reconstruction capability of different models for the spatial distribution of temperature on 2 July 2024. Observations indicate a pronounced northwest–cool and southeast–warm temperature pattern across Quanzhou, with temperatures generally below 30 °C in the high-elevation mountainous regions of the northwest. Among all compared models, the DG model (
Figure 7b) shows the highest similarity to the observed temperature field, accurately capturing the southeast–northwest-oriented warm tongue structure and achieving the lowest mean RMSE. In contrast, OSI exhibits a clear systematic cold bias with weak horizontal spatial contrasts, resulting in a high RMSE of 6.46 °C. OTI, while able to reproduce the general southeast–northwest temperature gradient, displays pronounced over-smoothing, leading to a substantially smaller extent of the cold center compared to the observations.
Figure 7e–h compare the spatial patterns of the precipitation fields. The daily accumulated precipitation on 15 June 2024 further highlights the differences among the models in capturing localized heavy rainfall events. By effectively integrating prior temporal evolution signals with the spatial distribution information from reanalysis data at the current time step through the graph neural network, DG successfully reproduces both the exact location and intensity of the heavy precipitation center. OSI exhibits pronounced over-smoothing, leading to the loss of key features of the heavy rainfall center. This behavior is likely related to the limited capability of the ERA5 reanalysis data to represent this event; nevertheless, its RMSE (26.41 mm/day) remains lower than that of OK (31.29 mm/day). OTI captures the general spatial distribution and high-value bands of precipitation reasonably well, but its overall RMSE (12.48 mm/day) is still higher than that of DG. In addition, the predicted precipitation center shows a positional shift relative to the observations and underestimates the peak intensity.
The experimental results from the two case studies demonstrate that DG, which simultaneously integrates multidimensional spatiotemporal information, effectively corrects the prediction biases arising from reliance solely on the temporal evolution of prior station observations or on the spatial distribution of reanalysis data at the current time step. By repeatedly learning the evolutionary trends of historical observations and incorporating the spatial patterns of surrounding reanalysis grid data, DG more accurately reconstructs both the location and intensity of localized heavy precipitation centers.
To comprehensively evaluate the predictive performance of different methods,
Figure 8a–d present the long-term spatial distribution characteristics of temperature interpolation errors for each approach. Among them, DG performs best, achieving an average RMSE of 0.35 °C, indicating that it consistently maintains high prediction accuracy in both topographically complex inland mountainous areas and coastal regions influenced by land–sea interactions. In contrast, although OK shows generally good performance (with an average RMSE of 0.87 °C), regions with relatively large errors appear along the northeastern coastal margins of Quanzhou and in parts of the central mountainous areas, where the maximum RMSE exceeds 7 °C. This is likely associated with sparse station coverage or boundary effects, which tend to amplify interpolation errors. OSI exhibits larger biases than the other methods, with widespread high-error regions, indicating that relying solely on spatial distance weighting is insufficient to capture regional temperature variability. Although OTI incorporates temporal information from prior station observations, its error distribution is highly uneven. By neglecting the geographical relationships among stations, the model forms isolated high-error zones around multiple stations, suggesting a limited ability to represent the spatial continuity of temperature fields.
Figure 8e–h depict the long-term spatial distribution of precipitation errors. The long-term error patterns further confirm the capability of DG in analyzing precipitation fields with strong spatiotemporal heterogeneity. Specifically, DG achieves the lowest precipitation prediction error, with an average RMSE of 3.10 mm/day. OK yields an average RMSE of 6.50 mm/day, which is approximately 210% of that of DG, indicating substantially poorer interpolation performance, particularly in the central and northern mountainous regions of Quanzhou. OSI exhibits a high average RMSE of 8.44 mm/day. This is mainly because precipitation is highly discontinuous in space, and OSI, which relies solely on spatial distance weighting, tends to average strong rainfall areas with surrounding rain-free regions, leading to pronounced over-smoothing and an underestimation of precipitation peaks. In contrast, OTI performs better than OSI (RMSE of 3.59 mm/day), likely because it captures the temporal persistence of precipitation from prior station observations. However, due to the absence of explicit geographical information, the error distribution of OTI appears as isolated patchy patterns, indicating persistent biases in predicting the spatial displacement of precipitation systems.