1. Introduction
Accurate runoff prediction, especially short-term runoff prediction, is critical for water resource management. However, due to human impacts on land and atmospheric systems, runoff has high spatiotemporal variability and uncertainty, making it more challenging to effectively capture the dynamic dynamics of short-term runoff time series [
1]. These issues are exacerbated by anthropogenic effects and climate variability, which increase the spatiotemporal variety and unpredictability of runoff patterns [
2,
3,
4,
5]. To solve these difficulties, the researchers have created a diverse set of prediction models, which can be roughly classified as physics-based models [
6,
7,
8,
9] and data-driven models [
10,
11,
12,
13].
Historically, physics-based models have been the foundation of hydrological predictions. These models replicate complicated, nonlinear hydrological processes by forecasting climatic variables (such as precipitation) and calculating runoff using rainfall–runoff models [
14]. It adequately explains the underlying physical mechanisms and has a high level of reliability. For example, the Soil and Water Assessment Tool (SWAT) has been combined with the ArcGIS interface to produce a two-dimensional basin-based hydrological model that has been widely used for runoff and water quality modeling in agricultural areas [
15]. Mike SHE and Mike 3 perform three-dimensional simulations of surface water flow and sediment transport in urban, coastal, and marine environments [
16]. Despite their utility, physics-based models are frequently limited by the requirement for high-quality, extensive physical data to calculate physical equations. In areas with limited observational networks or poor data quality, their use can be difficult, limiting their usefulness and scalability [
17]. In contrast, data-driven hydrological models place less emphasis on the description of physical processes in the hydrological cycle, instead focusing on the data-driven correlations between input and output variables [
18,
19,
20]. These models are highly efficient in learning, can be quickly applied to practical scenarios, and demonstrate good adaptability even in basins with insufficient spatial information or short temporal records. Support Vector Machines (SVMs) are models that minimize output errors by applying linear or nonlinear kernel weighting to input variables [
21]. SVMs, like linear regression, can model individual time steps in a time series, and they outperform physical models, like the MIKE Flood and Storm Water Management Model (SWMM) [
22], in terms of modeling speed and accuracy.
Data-driven hydrological models are often mappings between many input features and output targets. To use the typical linear and nonlinear machine learning techniques, the data must be reframed for time-series forecasting. To handle input sequences, the researchers created Recurrent Neural Network (RNN) models, such as LSTM and GRU [
23,
24,
25,
26,
27], which have been successfully used in a variety of investigations. LSTM models have been employed in soil moisture modeling [
28], groundwater level prediction, and daily or hourly rainfall–runoff modeling [
29,
30]. GRU models have been employed in short-term runoff predictions [
18], and when combined with techniques such as Variational Mode Decomposition (VMD) [
31], they have demonstrated an excellent predictive performance.
However, standalone data-driven models face their own set of challenges. While they excel in capturing relationships within historical data, their predictive accuracy often deteriorates with increasing lead times [
32,
33]. This problem emerges because these models rely mainly on past runoff data and ignore future climatic variables, which are important for effective runoff prediction [
34]. During the process of runoff generation, the hydro-meteorological interactions are very close [
35]. In most catchments, runoff is primarily produced either through direct precipitation or snowmelt-driven processes [
36]. As such, changes in precipitation or snowmelt are typically positively correlated with runoff volumes—an increase in these inputs often leads to greater runoff, and vice versa. Rising temperatures can enhance potential evapotranspiration, which is particularly impactful in arid regions, where limited precipitation leads to a substantial reduction in water availability for runoff generation [
37]. Moreover, extreme weather events can significantly alter runoff dynamics, potentially resulting in hydrological extremes, such as floods or droughts [
38]. In scenarios where reliable meteorological forecasts are unavailable [
39], this dependency on historical data becomes a significant bottleneck.
Recent breakthroughs in machine learning and meteorological modeling have opened new avenues for addressing these limitations [
40]. The emergence of AI-driven weather prediction models, Pangu-Weather, has revolutionized the field by significantly enhancing the accuracy and efficiency of meteorological forecasts [
39]. Pangu-Weather uses a three-dimensional Earth-Specific Transformer to process meteorological data, integrating spatial and temporal information, with unprecedented precision. By mitigating cumulative forecast errors via hierarchical temporal aggregation strategies, this model has outperformed traditional Numerical Weather Prediction (NWP) systems in both speed and accuracy. Leveraging such AI-driven meteorological forecasts in hydrological models can account for uncertainties in future conditions, thereby enhancing the robustness and reliability of runoff predictions.
This study presents a novel approach to runoff forecasting that combines Pangu-Weather, LSTM, and GRU. LSTM-Pangu and GRU-Pangu are designed to capitalize on the strengths of both AI-driven meteorological forecasts and data-driven hydrological modeling. By incorporating meteorological predictions into the runoff forecasting process, these models seek to overcome the constraints of classic LSTM and GRU models, particularly in scenarios involving extended lead times or harsh flow conditions.
The paper is arranged as follows.
Section 2 describes the LSTM, GRU, and Pangu-Weather models and their structures, followed by model evaluation criteria.
Section 3 provides an overview of the study area, data, and model configuration.
Section 4 includes the modeling results and discussions. Finally,
Section 5 summarizes this paper.
2. Materials and Methods
2.1. Long Short-Term Memory (LSTM)
LSTM is a form of Recurrent Neural Network (RNN) that has been built to solve the inadequacies of regular RNN models in dealing with long-term dependency concerns. A typical RNN struggles to successfully preserve and use long-term dependencies in sequential data, as previous information frequently loses influence during the propagation process. In contrast, LSTM, through its unique memory cell structure, excels at preserving and transmitting long-term information [
41].
Figure 1 illustrates the basic construction and operational flow of LSTM units.
At each time step, an LSTM unit has a distinct state, known as the cell state, in which data can be saved. The time-series input is shown in
Figure 1 as
, while the output is shown at the top as
.
Each LSTM cell refreshes six parameters at each time step. The required stages are described in Equations (1)–(6) [
42]. The first parameter is the forget gate parameter (
), which controls how much information from the prior state of the cell is forgotten. The linear equations in different steps have their own weight matrices (
) and biases (
). The closer the forget gate parameter (
) is to 0, the more information from the previous cell state is forgotten through the sigmoid function. The second parameter is the input gate parameter (
), which specifies what new information must be retained and supplied to the cell state at the current time step. The input gate factor (
) is determined by applying the sigmoid functions to the linear combination of
and
. Meanwhile, the new potential cell-state level (
) is calculated by applying the tanh function to the linear combination of
and
, where tanh is an activating function known as the hyperbolic tangent. Next, the state of the cell (
) is updated by combining the information retained by the forget gate with the new information from the input gate. Finally, the output gate parameter (
) is computed by performing a linear operation on
and
, and then using the sigmoid function. Finally, the current time step’s output (
) of the current time step is the product of
and the tanh function of the cell state (
).
In this study, given the time cost of converting meteorological factors into runoff and the temporal relationship between past and future runoff, utilizing LSTM as a neural network approach to handle time-series data, such as meteorological and historical runoff data, is extremely appropriate. The LSTM layer is an ordinary aspect in many recent machine learning packages [
43], making it convenient to use. As a result, the essential models in this study are implemented using the Keras framework’s built-in LSTM layer component.
2.2. Gate Recurrent Unit (GRU)
The LSTM network was first proposed in 1997 and applied to language processing tasks. It is well known for its extraordinary capacity to identify long-term and short-term dependencies. However, because of their somewhat complex structure, LSTM neural networks often take longer to construct and train. To overcome this issue, GRU networks were proposed as a simplification of the LSTM network, with a simpler topology [
44].
The structure and operational flow of GRU units, shown in
Figure 2, differ from LSTM by merging the hidden and cell states into a single state and using fewer gates for control. In the GRU cell, there are two control gates: the update gate (
) and the reset gate (
). The gate that updates (
) controls the quantity of information from the previous state (
) and is transferred to the current time step. The higher the value of the update gate (
), the greater the transmission of data. The reset gate (
) determines the quantity of information transferred from the previous situation to the present candidate situation. The lower the value of the reset gate, the less data are transferred from the previous state. The update equations are calculated as follows:
where
,
,
, and
are the weight vectors of the network.
and
are the network bias vectors, while
and
are vectors containing the activation values of the update and reset gates.
The GRU features a simpler structure and faster training speed while handling long sequence data. It is suitable for smaller data models. However, its ability to capture long-term dependencies is slightly inferior to that of LSTM. In this study, considering the relationship between meteorological factors, past runoff, and future runoff over time, the GRU model is also suitable for application. The GRU layer is an ordinary aspect in many recent machine learning packages [
43], making it convenient to use. Therefore, this study implements the relevant models using the built-in GRU layer component within the Keras framework.
2.3. Pangu-Weather
In recent years, AI technology has opened a new pathway for weather forecasting, significantly improving speed compared to traditional methods. However, the forecasting accuracy of most existing AI models still lags behind that of Numerical Weather Prediction (NWP) systems. Nevertheless, Pangu-Weather proposed by Bi et al. (2023) [
39] outperforms the present NWP system—in both accuracy and speed.
The structure of Pangu-Weather is shown in
Figure 3. By incorporating the vertical dimension (height) into the neural network, the model constructs a 3D architecture capable of explicitly modeling interactions across different atmospheric pressure layers. The 3D data are transmitted via an encoder–decoder design based on the Swin transformer [
45]. To better integrate Earth-specific physical constraints, the researchers replaced the original relative positional bias in Swin with geophysical positional biases, enabling a more accurate spatial encoding of atmospheric variables.
The model partitions atmospheric data into 13 vertical pressure levels (e.g., 500 hPa, 850 hPa) and multiple latitudinal zones (e.g., tropics, mid-latitudes), which helps reflect the geospatial dependencies of atmospheric dynamics. Traditional relative positional encodings are insufficient for capturing latitude-dependent Coriolis effects, vertical coupling across pressure layers (such as interactions between upper-level jet streams and surface wind fields), and land–sea contrasts that influence weather systems (e.g., monsoons). To address this, the model assigns independent positional bias parameters to each pressure level and latitude band, explicitly encoding their absolute spatial relationships. Although this geographic specialization increases the number of bias parameters by a factor of 527 compared to the original Swin architecture, it significantly improves the model’s ability to predict extreme weather events, such as typhoon trajectories, demonstrating the value of incorporating physical priors.
Compared to the baseline, 3DEST has the same computing cost but a higher convergence speed. To reduce aggregate forecast mistakes, they implemented hierarchical temporal aggregation, an unparalleled technique that always selects the deep neural network with the longest lead time. logically, this substantially decreases the number of iterations. This design significantly enhances the predictive capabilities of model across different spatial scales, achieving much higher accuracy compared to 2D models, such as FourCastNet [
46].
In this study, the version of the Pangu model described by Bi et al. (2023) [
39] was adopted, featuring a spatial resolution of
and a temporal resolution of 24 h intervals. This version was trained using 39 years of ERA5 reanalysis data from 1979 to 2017. After training, the model operates as a predictive system and initiates predictions or simulations given initial conditions. The input data for the Pangu-Weather model include surface variables and upper variables. The surface variables (input_surface.npy) are a numpy array with a shape of (4, 721, 1440), representing four surface variables (mean sea-level pressure, 10 m
u-component of wind speed, 10 m
v-component of wind speed, 2 m temperature). The upper-air variables (input_upper.npy) are a numpy array with a shape of (5, 13, 721, 1440), representing five upper-air factors (Z, Q, T, U, V) at 13 pressure levels, where Z, T, Q, U, and V represent the geopotential, temperature, specific humidity, and the u- and v-components of wind speed, respectively.
2.4. Pangu-Driven Runoff Prediction Model
The prediction framework, shown in
Figure 4, is derived from the Pangu model and multiple runoff predictor sub-models designed to forecast runoff for successive days. The model consists of several sub-models designed to predict runoff for the next m days. These sub-models are not independent; the output runoff of the
i-th sub-model serves as the input condition for the (
i + 1)-th sub-model. At each step of the prediction process, the model relies on the prediction from the previous step. The simple version of the first sub-model (Model 1) is defined as follows:
and that is
where
n is the time step,
is the predicted value for day (
n + 1), and
represents the observed data from day 1 to day
n.
is the predicted value for day (
n + 1). And
represents the observed meteorological data for the past
n days.
represents the observed runoff data for the past n days. The runoff for day (
n + 1) is predicted using both historical runoff data and historical meteorological data.
Next, the simple version of the
m-th sub-model (Model
m) is defined as follows:
and that is
where
m is the lead time, and
m <
n,
is the predicted value for day
n + m;
are the observed data from day
m to day
n; and
are the predicted values from the first to the (
m − 1)-th sub-model.
is the predicted data for day (
n + m), and
is the meteorological data predicted by the first to the (
m − 1)-th Pangu model, while
are the runoff data predicted by the first to the (
m − 1)-th sub-model. The runoff for day (
n + m) is predicted using (
n − m) runoff observations, (
n − m) meteorological observations, (
m − 1) runoff predictions, and (
m − 1) meteorological predictions.
Therefore, these m models for runoff prediction over m steps are not independent. For example, the first sub-model needs to be trained first, serving two purposes: predicting the runoff for the second day and providing input for the models in steps 2, 3, …, m, and so on.
The Pangu-Weather model obtains , makes predictions, and inputs the predicted values into each sub-model. The meteorological data include 2 m temperature and specific humidity values at 50 hPa and 100 hPa levels for 77 grids in the research region.
Within the model, the initial focus is on daily runoff prediction, which is gradually extended to multi-day predictions. For example, with a 3-day lead time and a 5-day time step, Model 1 forecasts runoff for the first day using historical meteorological and runoff data from the previous 5 days, while Model 2 forecasts runoff for the 2nd day using historical runoff and meteorological data from the previous 4 days, as well as the 1st-day runoff prediction and Pangu’s meteorological data forecast. Model 3 predicts the runoff for the third day using the historic runoff information and meteorological information from the previous three days, along with the runoff predictions for the first and second days from Models 1 and 2, and the forecast of Pangu for the first and second days’ meteorological data.
The sub-models were replaced with LSTM and GRU models to obtain the LSTM-Pangu and GRU-Pangu models, which were then used as experimental models for comparison.
In order to evaluate the models’ performance, they were pitted against two other models: the LSTM and the GRU. Both models forecast future runoff by combining runoff and meteorological data from prior days.
2.5. Performance Evaluation Methods
This study evaluated model performance using four metrics: Nash–Sutcliffe Efficiency (NSE), Pearson Correlation Coefficient (R), Mean Absolute Error (MAE), and Root Mean Square Error (RMSE). Among them, NSE and R are effective for assessing the accuracy and consistency of model predictions, while MAE and RMSE quantify the average variance between anticipated and observed values. The model’s performance improved as the NSE value approached one, or as the MAE and RMSE values approached zero [
47]. The formulas to calculate the four metrics are presented below:
where
N is the total length of the data;
is the observed daily runoff;
is predicted daily runoff;
is the mean actual runoff; and
is the mean expected runoff.
3. Overview of the Study Area and Model Configurations
3.1. Study Area and Data
The Beipan River Basin is located in southwestern China, spanning the Yunnan and Guizhou provinces. It originates in the Wumeng Mountains of Yunnan and flows through the transitional slope zone from the Yunnan–Guizhou Plateau to the Guangxi hills, ultimately joining the Hongshui River in Wangmo County, Guizhou. The basin lies between longitudes 103°50′ and 106°20′ E and latitudes 24°51′ and 26°45′ N. The main river stretches for 449 km with a total elevation drop of 1985 m, making it the river with the greatest fall in the Pearl River Basin. The geographical location of the basin is shown in
Figure 5.
The basin is predominantly characterized by karst topography, featuring deeply incised gorges, such as the Huajiang and Ye Zhong Grand Canyons, with vertical dissection depths ranging from 400 to 1400 m. Due to the high permeability of the karst formations, baseflow in the basin is relatively low compared to non-karst regions, while peak flows typically occur during the flood season. Runoff during the wet season (May to October) accounts for approximately 84% of the annual total, reflecting a high seasonal variability between flood and dry periods [
48].
The Guangzhao Hydropower Station is the largest hydropower facility in the basin. It has a normal water storage level of 745 m, a total reservoir capacity of 3.245 billion cubic meters, and an installed capacity of 1040 megawatts, representing about 39% of the basin’s total installed capacity. The dam site is located in a deeply incised valley with exposed bedrock on both sides. The region experiences a subtropical monsoon climate with an average annual precipitation of 1178.8 mm, and peak flood discharges typically occur between June and July.
The Dongqing Hydropower Station is a key downstream facility within the Beipan River Basin, located below the Guangzhao Hydropower Station. It primarily serves power generation, flood control, and water regulation functions. The station has a normal water level of approximately 760 m, a total reservoir capacity of about 1.86 billion cubic meters, and an installed capacity of 600 megawatts. The dam is situated in a mountainous karst region with steep terrain and significant elevation differences. As with other parts of the basin, the area experiences pronounced seasonal runoff variation.
This study selected daily runoff data from the upstream of Dongqing Hydropower Station in the Beipan River Basin, covering the period from 16 April 2015 to 23 May 2023, as the sample dataset. According to the research requirements, the sample data were divided into calibration and validation periods with a 9:1 ratio. The calibration period from 16 April 2015 to 7 August 2022 was used for model training, while the validation period from 8 August 2022 to 23 May 2023 was used for model testing.
Figure 6 depicts the process of daily runoff.
The Climate Data Store platform provided the meteorological data [
49], which included various observed data, such as temperature and wind speed. In this study, specific humidity and surface temperature were selected as representative meteorological variables to input into the model. These data have a high spatial and temporal resolution and accuracy, and are widely used in the climate change research, hydrological model simulations, and ecological environmental assessments. The high-quality meteorological data effectively reflect the actual climatic characteristics of the Beipan River Basin, providing reliable support for model construction and optimization.
To ensure the reliability and applicability of the data, strict quality control and preprocessing were performed, including outlier removal and missing value imputation.
3.2. Model Configurations
In this study, training and testing data were generated using a sliding window function (with a window size of 5d, 7d, 9d, 11d, 14d, 16d, 18d, or 20d), which created fixed inputs, including historical meteorological information and runoff data, with the output being the runoff for the target forecast time. Normalization was applied to both the training and testing sets of the dataset, which was split into 90% for training and 10% for testing. Following model training, predictions were made on the test set, and the results are denormalized to return them to their actual values.
Since the architectures of LSTM and GRU are very similar, the same input and output structure was designed for both models. The input sequence consisted of n days of historical runoff data and meteorological information, and the output sequence represented the runoff m days ahead. Both the input and output data were created using the sliding window approach.
In the LSTM-Pangu and GRU-Pangu models, the model parameters are identical to those of the LSTM and GRU models. In this study, a two-layer LSTM or GRU network was employed to model the runoff relationship based on the empirical results. Ten neurons made up the second layer, while thirty neurons made up the first. The input layer consisted of two features: one was historical runoff data, and the other was historical meteorological information; the output layer was the predicted runoff value for the target time.
Using the Mean Squared Error (MSE) as the loss function, the following formula is used:
where
is the observed value and
is the forecasting value at time step
i.
The Adam optimizer was used in the optimization process, and a learning rate of 0.001 was used. To prevent overfitting, an early stopping mechanism was introduced during training: if the loss function changed by less than a predefined threshold (e.g., 0.001) over 10 consecutive iterations, training was stopped early. Ultimately, both the LSTM and GRU networks consist of two layers: A completely connected layer for the final output comes after the first layer, which has 30 units (LSTM or GRU), and the second layer, which has 10 units.
To compare the performance of LSTM, GRU, LSTM-Pangu, and GRU-Pangu models, the LSTM-Pangu and GRU-Pangu models were trained with exactly the same parameter settings as the LSTM and GRU models to ensure the fairness and comparability of the results.
4. Results and Discussions
4.1. Comparison and Analysis of Model Performance Under Different Time Steps
This study conducted two sets of comparative experiments to comprehensively evaluate the effectiveness of models in forecasting daily runoff for the Beipan River Basin: (1) an analysis of the differences between the LSTM-Pangu and LSTM models; and (2) a contrast with the GRU model and the GRU-Pangu model. The purpose of these experiments is to investigate how different model configurations and parameter settings affect runoff prediction accuracy.
In the first phase of the experiment, a lead time of 3d was fixed, and different input window lengths (5d, 7d, 9d, 11d, 14d, 16d, 18d, 20d) were set to explore their impact on the prediction accuracy. The sliding window method was employed to generate the corresponding training and testing samples, and prediction metrics were calculated for each model, including NSE, R, MAE, and RMSE. The specific results are shown in
Table 1,
Table 2,
Table 3 and
Table 4.
The following are each model’s prediction metrics when the time step is 5d: The LSTM, LSTM-Pangu, GRU, and GRU-Pangu models have respective NSE values of 0.8332, 0.8378, 0.8315, and 0.8364; the R values are 0.9127, 0.9208, 0.9119, and 0.9254; the MAE values are 24.1841, 23.2318, 25.1731, and 23.9230; and the RMSE values are 36.1030, 33.9629, 36.0200, and 33.9645. When the input window increases to 7, the NSE values increase by 0.0289, 0.0415, 0.0284, and 0.0317; the R values increase by 0.0157, 0.0169, 0.0154, and 0.0117; the MAE values decrease by 1.7721, 2.005, 2.8858, and 2.5578; and the RMSE values decrease by 3.1192, 3.7732, 3.1925, and 3.8754.
Overall, the LSTM-Pangu and GRU-Pangu models demonstrate more substantial improvements compared to LSTM and GRU. To quantitatively assess the significance of performance improvements introduced by the Pangu-enhanced models, we conducted paired two-tailed
t-tests for each metric across all the tested time steps. The results, shown in
Table 5, demonstrate that the improvements in NSE, R, MAE, and RMSE achieved by both LSTM-Pangu and GRU-Pangu models are statistically significant (
p < 0.05) when compared with their baseline counterparts. These findings validate the effectiveness of integrating Pangu weather features into the recurrent neural network frameworks and support our earlier claims of improved model accuracy.
However, when the time step is further increased to 11d and 14d, the prediction accuracy of the models no longer improves significantly and even shows a slight decline. The NSE of the LSTM-Pangu model, for instance, is 0.8317 at time step 11, which is less than the 0.8793 at time step 7. When the time step is 14, the NSE of the GRU-Pangu model is 0.8367, somewhat more than the 0.8226 with a time step of 11. Meanwhile, the MAE and RMSE both exhibit strikingly similar trends, indicating that excessively long time steps may introduce more noise. This is likely attributable to the diminished hydro-meteorological interactions between historical meteorological variables and future runoff observations when the time steps become too large, thereby interfering with the predictive ability of model.
Line graphs were plotted to show the variation in prediction accuracy metrics with the time step (as shown in
Figure 7), providing a visual representation of the performance trend of model. The results show that a time step of 7 yields the best performance for both traditional and improved models. Because runoff is inherently influenced by meteorological drivers, such as precipitation, temperature, and solar radiation, if the time step is short, the model fails to capture adequate meteorological–runoff correlation information. Therefore, when the time step grows, the prediction of model accuracy improves. However, the influence of runoff and meteorological information from previous days on future runoff predictions gradually weakens. An excessively long time step may introduce excessive irrelevant noise, thus reducing the prediction accuracy. A time step of 7 was determined to strike a balance between capturing short-term correlations and avoiding information redundancy, achieving optimal performance. Further analysis indicates that as the time step gradually rises from 5 to 20, the prediction accuracy of each model first increases, then decreases, and finally stabilizes. Each model’s forecast accuracy tends to stabilize as the time step rises, as seen graphically in
Figure 7.
4.2. Comparison and Analysis of Model Performance Under Different Lead Times
In the second stage of the experiment, the focus shifted to evaluating the prediction accuracy of several models using a fixed-input time step of 7d across various lead times (2d, 3d, 4d, 5d). The experimental results are summarized in
Table 6,
Table 7,
Table 8 and
Table 9. From the analysis, it is obvious that, regardless of whether the traditional LSTM and GRU models or the upgraded LSTM-Pangu and GRU-Pangu models are employed, the prediction accuracy is good for shorter lead times (e.g., a lead time of 2), and the disparities among the four models are generally small.
For instance, each model performs as follows when the lead time is 2: the NSE values of the LSTM, LSTM-Pangu, GRU, and GRU-Pangu models are 0.9364, 0.9321, 0.9353, and 0.9345; the R values are 0.9676, 0.9654, 0.9671, and 0.9666; the MAE values are 15.7868, 15.7625, 16.3566, and 16.3645; and the RMSE values are 24.1042, 25.2242, 24.2787, and 26.4813, respectively. These findings suggest that both the standard and upgraded models’ prediction accuracy is comparable for shorter lead times, and there is no discernible benefit to the improvements.
As the lead time increases, both traditional and improved models show a certain degree of decline in the prediction accuracy. However, the improved LSTM-Pangu and GRU-Pangu models exhibit more significant advantages compared to the traditional models for longer lead times (e.g., a lead time of 5). For instance, the LSTM, LSTM-Pangu, GRU, and GRU-Pangu models’ respective NSE values are 0.6909, 0.7587, 0.6836, and 0.7642 at a lead time of 5; the R values are 0.8312, 0.8710, 0.8268, and 0.8741; the MAE values are 30.8651, 28.3561, 31.0889, and 29.4852; and the RMSE values are 51.0841, 45.7546, 50.6629, and 47.1663. Compared to the traditional models, the NSE of the LSTM-Pangu model improves by approximately 8.1%, and the NSE of the GRU-Pangu model improves by approximately 11.7%. This suggests that the improved models exhibit higher robustness and accuracy for runoff predictions over longer lead times.
It is evident from the line graph in
Figure 8 that the models’ performance indicators alter dramatically with increasing lead times. In particular, the NSE and R metrics exhibit a clear downward trend, indicating that the fit and correlation of the model weaken as the lead time rises. In the meantime, the MAE and RMSE metrics show a significant increase, indicating that the prediction mistakes and biases increase as the lead time lengthens. Further analysis of the bar chart reveals that the performance difference between the LSTM and LSTM-Pangu models improves in all four performance parameters as the lead time increases, and the gap between the GRU and GRU-Pangu models becomes more pronounced.
This phenomenon suggests that, at shorter lead times (e.g., lead times of 2 or 3), runoff prediction may rely more on the initial conditions, with meteorological factors having less of an impact. Therefore, the inclusion of the Pangu framework has a limited influence on the prediction accuracy. However, as the lead time extends, the influence of meteorological conditions on the runoff process becomes more pronounced. The models need to capture the complex relationship between runoff and meteorological data more accurately. At this stage, the LSTM-Pangu and GRU-Pangu models exhibit significant advantages due to their enhanced ability to extract and process meteorological information, leading to improved prediction accuracy.
In conclusion, the improved LSTM-Pangu and GRU-Pangu models more comprehensively reflect the driving mechanisms of the runoff process at longer lead times, offering higher prediction accuracy and robustness. This indicates that the introduction of the Pangu framework is significant for improving hydrological models, especially in application scenarios with long time spans and high-precision prediction requirements, where its advantages are particularly prominent.
4.3. Comparison and Visualization Analysis of Model Prediction Results
Scatter plots for the LSTM, LSTM-Pangu, GRU, and GRU-Pangu models (with a time step of 7 and lead time of 3) were created to visually compare their prediction performance (as shown in
Figure 9). The difference between the expected and observed values is shown in these plots. From the scatter plot, it can be noticed that, whether it is the LSTM or GRU model, or the upgraded LSTM-Pangu or GRU-Pangu model, the prediction accuracy is high under low-flow conditions, with predicted values closely matching the observed values. This suggests that the variable characteristics of runoff during low-flow times may be adequately captured by all models.
However, during the medium-to-high flow phases, the LSTM-Pangu and GRU-Pangu models significantly outperform the traditional LSTM and GRU models. The enhanced models’ scatter plot distribution is nearer the regression line, suggesting that they more precisely depict the trend and magnitude, with a particularly significant advantage in peak value prediction.
Further analysis indicates that the bias in traditional models during medium-to-high flow phases is primarily due to systematic underestimation, with predicted values deviating significantly from the observed values. This systematic bias mainly stems from the models’ limited ability to capture the complex and nonlinear hydrological responses associated with high-flow events. In particular, runoff during such periods is strongly influenced by intense rainfall or sudden upstream inflows, which introduce substantial variability and nonlinearity into the system. Since these high-flow events occur less frequently in the dataset, traditional models may not adequately learn their corresponding patterns, leading to consistent underprediction.
Traditional models, such as LSTM and GRU, rely solely on historical runoff and meteorological data, and often fail to capture the rapid dynamics of these extreme events—especially in terms of peak magnitudes and timing. In contrast, the improved models (LSTM-Pangu and GRU-Pangu), through the integration with the Pangu model for enhanced meteorological forecasting, can better represent key driving factors and their interactions with runoff. This significantly improves the models’ robustness and accuracy during dynamic high-flow conditions.
On the other hand, during low-flow periods, runoff is primarily governed by relatively stable processes, such as groundwater recharge, soil infiltration, and baseflow, which are less sensitive to short-term meteorological fluctuations. As a result, both traditional and improved models perform well under these conditions.
To further address the issue of systematic underestimation in traditional models during high-flow phases, potential strategies include enriching the input feature set with more detailed hydrometeorological variables, incorporating spatially distributed data, or exploring hybrid or physics-informed deep learning frameworks to better represent the underlying hydrological processes during high-flow events.
For runoff prediction in the Beipan River Basin, accurately predicting peak values is especially crucial as it directly affects water resource management efficiency and flood control effectiveness. The LSTM-Pangu and GRU-Pangu models can predict peak values more accurately, indicating their greater practicality and reliability in real-world applications. Overall, the improved models proposed in this study not only capture the trend changes in the overall runoff in the basin more effectively, but also provide stronger support for predicting extreme events during medium-to-high flows, offering important reference for the subsequent research on runoff prediction.
In summary, due to the high precision of the Pangu model in meteorological prediction, the LSTM-Pangu and GRU-Pangu models achieved significant improvements in runoff prediction compared to traditional models. Their rolling prediction framework not only fully considers different lead times but also demonstrates strong applicability and transferability, with broad potential for real-world applications.
5. Conclusions
In order to enhance the performance of conventional LSTM and GRU models in daily runoff prediction, this study presented the LSTM-Pangu and GRU-Pangu models based on the Pangu-Weather model, combining Pangu-Weather meteorological prediction capability, because runoff is inherently influenced by meteorological drivers, such as precipitation, temperature, and solar radiation, which directly affect processes like infiltration, evaporation, and snowmelt. By integrating these variables, the model can more accurately reflect the physical processes governing runoff generation, particularly under changing weather conditions. These improved models effectively describe the relationship between predictive factors and runoff, offering limited advantages in short lead times. However, their superiority becomes more pronounced as the lead time increases, particularly in scenarios where accurate meteorological forecasts are unavailable, demonstrating a significant advantage of long lead times, especially in the absence of accurate meteorological forecast data. According to the experimental findings derived from the daily runoff data from the Beipan River Basin, the improved models have a promising predictive performance. The specific conclusions are as follows:
- (1)
For both LSTM and GRU models, as well as the improved LSTM-Pangu and GRU-Pangu models, the optimal time step is consistently seven days. The forecast accuracy then gradually declines and then stabilizes as the time step increases.
- (2)
As the lead time increases, every model’s predicted accuracy decreases to some extent. However, the accuracy of the LSTM and GRU models decreases more rapidly. On the other hand, the accuracy of the LSTM-Pangu and GRU-Pangu models declines more gradually, with the advantages of the improved models becoming increasingly pronounced over longer lead times.
- (3)
In daily runoff predictions, the LSTM, GRU, LSTM-Pangu, and GRU-Pangu models all exhibit high accuracy in predicting low runoff levels, with minimal differences among the models. The LSTM-Pangu and GRU-Pangu models, however, perform noticeably better in terms of their forecast accuracy for medium- and high-runoff levels than the conventional LSTM and GRU models.
In conclusion, the observed daily runoff in the Beipan River Basin exhibits significant variability during the summer flood season. Traditional LSTM and GRU models, lacking essential meteorological forecast data, fail to effectively capture runoff dynamics, resulting in a poorer predictive performance. In contrast, the LSTM-Pangu and GRU-Pangu models, by incorporating future meteorological factors, are better equipped to reflect runoff variability accurately.