1. Introduction
Accurate river flow forecasting is particularly important in small catchments, where hydrological responses occur rapidly and with high variability. In such contexts, hourly-scale prediction is especially valuable, as it enables timely flood warnings, supports emergency decision-making, and enhances the efficiency of water resource allocation [
1]. Compared with daily forecasts, hourly predictions provide finer temporal resolution, which is crucial for addressing the challenges of flash floods, dam regulation, and real-time watershed management [
2].
Despite its importance, hourly-scale river flow forecasting remains underexplored. Traditional hydrological models, such as SWAT and HEC-HMS, provide physically interpretable frameworks but face limitations in parameter calibration, structural complexity, and computational cost [
3,
4]. With the rise of data-driven approaches, machine learning methods such as artificial neural networks (ANNs), support vector machines (SVMs), and random forests (RFs) have been applied to flow forecasting [
5,
6,
7,
8]. However, these models are often criticized as “black boxes”, prone to overfitting and lacking interpretability [
9]. To overcome these shortcomings, deep learning models, particularly recurrent neural networks (RNNs) and their variants, have been widely introduced into hydrology. The long short-term memory (LSTM) network alleviates gradient vanishing problems and captures long-term dependencies, while the gated recurrent unit (GRU) offers a simplified structure with comparable performance [
10,
11,
12,
13]. Nevertheless, their accuracy often degrades when applied to highly nonlinear or long-duration flood events. More recently, Transformer-based models have demonstrated strong capabilities in long-sequence forecasting through self-attention mechanisms [
14,
15]. Variants such as Informer and Autoformer have improved efficiency and long-term prediction accuracy [
16,
17], yet they still struggle with capturing local details, exhibit high computational complexity, and often require large datasets—making them less suitable for small catchments [
18]. To further enhance sequence modeling, the xLSTM framework was recently introduced as an innovative extension of LSTM, designed to improve memory capacity and temporal feature extraction [
19]. While xLSTM has shown promising results, they remain limited in effectively capturing multi-scale temporal dependencies and in emphasizing key timestep features, which are critical in hourly-scale hydrological forecasting.
In this study, an enhanced temporal attention module xLSTM (TAM-xLSTM) model is proposed to address these challenges. To evaluate model performance, a multi-source hydrometeorological dataset was constructed by combining meteorological, hydrological, and surface energy variables. Case studies were carried out at three stations in Guizhou Province, China: Panghai, Xinghua, and Zhijin. Panghai Station is affected by dam operations during the dry season, Xinghua Station represents near-natural hydrological conditions, and Zhijin Station provides nearly 50 years of streamflow records. These diverse settings enable a comprehensive assessment of forecasting accuracy and generalization capability.
The objective of this study is to develop a robust deep learning framework capable of capturing multi-scale temporal dependencies and critical hydrological features for fine-scale flow prediction. Specifically, TAM-xLSTM integrates dilated temporal convolutions and channel–temporal attention mechanisms to enhance both accuracy and stability. By testing the model across different hydrological regimes and long-term historical records, this work aims to advance methodological development in hourly-scale hydrological forecasting and to provide practical insights for flood risk management and water resource planning.
3. Methods
3.1. Research Strategy
The methodology of this study follows a structured workflow designed to ensure both predictive performance and physical interpretability. The overall strategy consists of the following steps:
Data acquisition and preprocessing—Collection of hourly hydrological and meteorological observations from six stations in the Qiandongnan region, supplemented with satellite-derived radiation data. Records were temporally aligned, missing values interpolated, and variables integrated into a unified dataset.
Feature construction—Inclusion of precipitation, solar radiation, and basin morphometric indices to represent both external hydrometeorological drivers and intrinsic catchment response mechanisms.
Model development—Design of the TAM-xLSTM, integrating a T-WaveNet module and a CBAM-1D attention mechanism to enhance multi-scale temporal feature extraction and timestep-level sensitivity.
Model training and validation—Generation of training samples using a sliding window approach, with explicit train/validation/test splits and a three-day buffer period to prevent temporal leakage. Supervised learning with early stopping was employed for parameter optimization.
Evaluation and uncertainty analysis—Assessment of predictive performance using RMSE, MAE, Theil’s U, and NSE, complemented by bootstrap-based 95% confidence intervals to quantify uncertainty and robustness.
This workflow is summarized in the schematic diagram (
Figure 1), which illustrates the overall research strategy from data preparation to model evaluation.
3.2. Study Area
This study focuses on the Qiandongnan region in Southeastern Guizhou Province, China, located between latitudes 24°19
′ N–26°49
′ N and longitudes 107°17
′ E–109°35
′ E. Situated in the transition zone between the Yungui Plateau and the Western Hunan hills, the region exhibits complex topography and hydroclimatic conditions, representative of mountainous plateau areas in Southwest China. Elevation ranges from below 300 m to over 2000 m, forming distinct vertical climate zones. Extensive karst landforms, including depressions, caves, and subterranean rivers, strongly influence local hydrological processes [
21].
Qiandongnan belongs to the Pearl River Basin and features a dense river network. The Duliu River, a major tributary of the Qianjiang system, spans about 310 km with a drainage area of 15,600 km2, while the Qingshui River, part of the Longjiang system, extends 280 km with a basin area of 13,000 km2. These rivers provide water for domestic use, irrigation, ecological functions, and hydropower development. The region has a subtropical humid monsoon climate, with an annual mean temperature of about 15 °C and precipitation of 1100–1400 mm, most of which falls between May and September. Influenced by terrain, rainfall distribution is highly uneven, and localized extreme events often trigger floods and geological hazards.
Overall, the study area’s combination of rugged topography, diverse river systems, and pronounced seasonal variability makes it suitable for investigating hydrological responses in mountainous catchments and for developing river flow prediction models. The geographical distribution is shown in
Figure 2.
3.3. Dataset
This study utilizes hourly river flow, precipitation, and solar radiation data to investigate the influence of these climatic variables on river discharge. The dataset comprises hydrological and meteorological observations collected from six monitoring stations in the Qiandongnan region of Guizhou Province, China, covering the period from 2022 to 2023, as summarized in
Table 1. Historical data for these variables were obtained from the Guizhou Meteorological Bureau and the FY-4A satellite, ensuring high accuracy and consistency with the model design. To address missing meteorological values caused by occasional monitoring equipment failures, linear interpolation was applied to maintain data completeness. Although meteorological forecast variables such as precipitation forecasts, temperature, pressure, and wind speed are practically relevant, the available forecast data in the Qiandongnan region suffer from large amounts of missing values and low measurement accuracy. Using these data directly may introduce noise and reduce prediction reliability. Therefore, this study employs high-resolution gridded precipitation datasets and satellite-based remote sensing products, which provide more complete temporal coverage and higher spatial accuracy. Preliminary experiments further confirmed that these datasets yield better model performance than the available meteorological forecasts.
To ensure data consistency, the raw data in
Table 1 were carefully extracted and preprocessed, with particular attention paid to temporal alignment across sources. For example, the surface solar radiation data from the FY-4A satellite were timestamped using Coordinated Universal Time (UTC), whereas the ground-based precipitation records were logged in Beijing Time (UTC+8). Therefore, to synchronize the two datasets, the timestamps of the satellite data were uniformly shifted forward by 8 hours to match the local time. After preprocessing, the hydrometeorological variables were integrated into a unified dataset. The features of the final dataset are summarized in
Table 2.
Beyond their statistical role as model inputs, these variables also reflect underlying physical processes that govern river flow, particularly at the hourly scale. Precipitation is the most direct driver of discharge, with rainfall events rapidly generating surface runoff and sharp flow rises within hours. Solar radiation influences flow indirectly through evapotranspiration and snowmelt: strong radiation may enhance evapotranspiration and reduce discharge, while in snow-affected basins it can accelerate meltwater contribution. Basin area determines the spatial extent of water collection, with larger basins sustaining higher long-term discharge but exhibiting slower hydrological responses, whereas smaller basins respond more quickly to rainfall. Average slope reflects topographic steepness: steeper slopes promote faster overland flow and reduced infiltration, leading to more pronounced short-term peaks, while gentler slopes delay and attenuate hydrographs. Similarly, stream length and density shape runoff concentration: longer channels and lower densities delay and dampen flood peaks, whereas shorter channels and higher densities accelerate flow convergence.
Together, these meteorological and morphometric variables capture both external hydrometeorological drivers and intrinsic basin response mechanisms, thereby providing a physically meaningful foundation for improving the accuracy and interpretability of hourly flow forecasting.
To evaluate the interrelationships among the selected features, a correlation matrix analysis was conducted on the hydrometeorological variables, and a corresponding heatmap was generated. As shown in
Figure 3, among the selected features, Water_Level exhibits a strong positive correlation with river flow, indicating that it directly reflects hydrodynamic conditions. In contrast, meteorological variables such as precipitation-related metrics show relatively low linear correlation coefficients. Although their direct linear relationship with river flow is weak, these variables may still contribute valuable information in a nonlinear modeling framework, particularly in capturing short-term fluctuations and lagged hydrological responses.
In addition to the main experiment using hourly data, this paper introduces a long-term, daily-scale hydrometeorological dataset from the Zhijin Hydrological Station in Bijie City, Guizhou Province, as supplementary validation to further evaluate the robustness and generalization of the proposed model over long time spans. This dataset covers the period from 1975 to 2023 and contains basic hydrometeorological characteristics such as daily rainfall, temperature, wind speed, air pressure, and water level. Although this dataset differs from the hourly data used in the main experiment in terms of temporal resolution and feature composition, its long-term nature helps evaluate the model’s performance under different hydrological conditions, particularly in extreme climate years, further validating the model’s temporal transfer capabilities and generalization.
3.4. T-WaveNet Module
Graph WaveNet, an extension of the original WaveNet architecture, incorporates both graph convolution layers and temporal convolution layers to model spatiotemporal data structures [
22]. While effective in capturing spatial dependencies, traditional graph WaveNet models suffer from increased computational complexity and training difficulty due to the integration of graph convolution operations. To address these limitations, this study proposes temporal-WaveNet (T-WaveNet)—a streamlined variant of WaveNet that removes the graph convolution component while preserving the model’s powerful temporal modeling capabilities. Unlike graph WaveNet, T-WaveNet exclusively focuses on processing temporal sequences through gated temporal convolutional layers (Gated TCN), avoiding graph-based operations. This design results in a more lightweight architecture with reduced computational overhead, while retaining the strengths of WaveNet in time series forecasting. The structure of the T-WaveNet module is illustrated in
Figure 4.
T-WaveNet is a lightweight variant of the WaveNet architecture, specifically designed for temporal data modeling. By removing the complexity of graph convolutional layers, T-WaveNet simplifies the overall model structure while preserving the powerful temporal modeling capabilities of the original WaveNet. This design significantly improves computational efficiency and offers enhanced flexibility and scalability, making it well suited for real-time prediction tasks involving large-scale time series data.
3.5. CBAM-1D Module
The convolutional block attention module (CBAM) is an attention mechanism designed to enhance the representational power of convolutional neural networks [
23]. To improve the model’s ability to capture key feature channels and critical temporal steps, this study adopts the 1D convolutional block attention module (CBAM-1D) to reweight the input time series data. Unlike conventional spatial attention mechanisms, CBAM-1D incorporates a temporal attention mechanism to enhance the model’s focus on important moments in sequential data. The CBAM-1D module first applies channel attention followed by temporal attention, and its architecture is illustrated in
Figure 5.
The input features are processed through the channel attention mechanism, which dynamically learns the importance weights of each feature channel. By computing these weights, the model effectively captures the most critical information from the input features and generates optimized intermediate representations. The structure of the channel attention module is shown in
Figure 6.
The specific process is as follows: given the input features
, global average pooling and global max pooling are first performed along the temporal dimension
T to obtain two channel descriptors. These descriptors are then passed through a shared two-layer multilayer perceptron (MLP) for feature transformation. Finally, a sigmoid activation function is applied to produce the channel attention weights
. The computation of channel attention is formulated as follows:
where
is the sigmoid activation function.
Subsequently, the input features are element-wise multiplied with the channel attention weights to obtain the enhanced feature map
:
The timestep attention module is designed to dynamically adjust the importance of different timesteps in the input sequence, allowing the model to automatically focus on critical moments based on task requirements. In time series data, some timesteps may contain more crucial information and have a greater impact on the prediction results, while others may include noise or less informative content. By assigning a weighting coefficient to each timestep, the module ensures that the model can adaptively modulate its attention across the sequence. The structure of the timestep attention module is illustrated in
Figure 7.
The specific process is as follows: First, global max pooling and average pooling are performed along the channel dimension
C to obtain two one-dimensional temporal feature sequences, which are then concatenated. The concatenated features are fed into a 1D convolutional network (Conv1D) to extract local temporal patterns. Finally, a sigmoid activation function is applied to generate the temporal attention weights
. The calculation process of the temporal attention is as follows:
where
is the 1D convolution operation and
represents feature concatenation.
Finally, the feature map is multiplied element-wise with the temporal attention weights to obtain the final output
:
3.6. TAM-xLSTM Network Architecture
To further improve the accuracy of hydrological flow prediction and the capability of temporal feature modeling, this study proposes the temporal attention module xLSTM (TAM-xLSTM) model, which integrates the T-WaveNet module and the CBAM-1D module. TAM-xLSTM is designed to address the limitations of traditional LSTM models in capturing long-term dependencies in complex time series data. By introducing attention mechanisms, the model dynamically adjusts the importance of different timesteps, thereby enhancing overall performance. The architecture of the TAM-xLSTM network is shown in
Figure 8.
The TAM-xLSTM model consists of three main modules: the T-WaveNet module, the CBAM-1D attention module, and the xLSTM encoding module. These modules work in close coordination to effectively enhance the model’s ability to perceive multi-scale temporal features and respond to key time points. The T-WaveNet module is composed of multiple layers of dilated causal convolutions, enabling the model to capture both short-term fluctuations and long-term trends in the input sequence across different receptive fields. The core idea of the CBAM-1D module is to dynamically assess the importance of feature dimensions using the channel attention mechanism, followed by a modified temporal attention mechanism to identify key time segments. This module outputs a weighted temporal feature sequence, providing the xLSTM model with more significant and focused temporal input. The xLSTM is built using the msm structure, incorporating both sLSTM and mLSTM units. It includes a multi-head mechanism and projection capability, enabling it to encode the dynamic changes of time series from multiple perspectives. Upon receiving the weighted temporal input, the xLSTM processes the data step-by-step and learns deep temporal dependencies.
The entire TAM-xLSTM model takes hourly river flow and meteorological data as input and sequentially extracts temporal features through the aforementioned modules. Finally, a linear layer is used to output the predicted values. This structure combines the efficiency of local convolutional modeling with the temporal modeling advantages of the xLSTM architecture, significantly enhancing the model’s adaptability and prediction accuracy for non-stationary time series data.
3.7. Validation and Evaluation Metrics
This study adopts a rigorous cross-validation method, training and testing different models using the same dataset. To comprehensively evaluate the prediction performance of the model, this study selected four evaluation metrics with clear hydrological significance: The root mean square error (RMSE), The Nash–Sutcliffe efficiency (NSE), the mean absolute error (MAE), and Theil’s U.
The RMSE is the square root of the average of the squared differences between the predicted values and the true values. RMSE is more sensitive to deviations in extreme river flow predictions. In applications such as flood forecasting, accurately predicting river flow is crucial. This metric has been widely used in hydrological model evaluation. The calculation formula is:
In the formula, is the i-th observed value, is the i-th predicted value, and n is the number of data points.
The NSE coefficient is a widely used performance evaluation metric in hydrological modeling, meteorological forecasting, and environmental simulations [
24]. It measures the degree of fit between the model’s predicted values and the actual observed values and is particularly suitable for assessing the prediction accuracy of hydrological time series such as river flow and precipitation. The calculation formula is:
In the formula, is the i-th observed value, is the i-th predicted value, is the mean of all observed values, and n is the number of data points.
The MAE is the average of the absolute errors between the predicted values and the actual values. It intuitively reflects the average magnitude of prediction errors and is insensitive to outliers, making it suitable for assessing the stability of daily river flow predictions. This metric is often used as a benchmark for comparing hydrological models. The calculation formula is:
In the formula, is the i-th observed value, is the i-th predicted value, and n is the number of data points.
Theil’s U statistic is a metric used to measure the deviation between regression model predictions and actual observed data. By normalizing the RMSE, this indicator allows for the comparison of prediction performance across different basins or time scales. A value of
indicates that the model performs better than a naive forecast, making it particularly suitable for evaluating the systematic bias of river flow prediction models. The calculation formula is:
In the formula, is the i-th observed value, is the i-th predicted value, and n is the number of data points.
The combined use of these evaluation metrics enables a comprehensive assessment of the model’s performance across multiple dimensions of river flow prediction, offering valuable insights for targeted model optimization. In particular, under varying application scenarios, such as extreme event forecasting and routine daily flow simulation, each metric captures distinct aspects of model behavior, thereby providing a multifaceted understanding of its strengths and limitations.
4. Results
The processor used in this experiment is an i5-12400F produced by Intel (Santa Clara, CA, USA), with a main frequency of 2.5 GHz. The graphics card is an NVIDIA GeForce GTX 4060 (NVIDIA Corporation, Santa Clara, CA, USA), with 8 GB of video memory. The memory of the experimental equipment is 16 GB, and the operating system used is Windows 10. The network model was built using the PyTorch 2.0.1 framework for deep learning, with CUDA version 11.0. The experimental parameters adopted by each network model are as follows: the training rounds of epoch are 400, the sliding window of input is set to 72, the batch size of each input sample is 1024, and the Adam optimizer is used to update parameters during the training process, with an initial learning rate of 0.001. The experimental environment and parameter settings are shown in
Table 3.
This study conducts a comprehensive training and evaluation of six models: LSTM, GRU, Transformer, Informer, xLSTM, and the improved TAM-xLSTM. The models are trained using hourly data on river flow and precipitation collected from six hydrological stations in the Qiandongnan region of Guizhou Province, together with solar radiation data obtained from the FY-4A geostationary meteorological satellite. To construct the training dataset, a sliding window technique with a window length of 72 hours is applied, enabling the models to capture short-term to medium-term hydrological variation patterns.
For data partitioning, the training set comprises all remaining records from the six stations covering January 2022 to December 2023, excluding the validation and test periods of the target station. At each target station, the two months immediately preceding the test month are reserved as the validation set, while the entire target month serves as the test set. To avoid information leakage from temporal overlap, a three-day buffer period is inserted between the validation and test intervals. This split ensures strict temporal separation among training, validation, and testing, allowing robust evaluation of model generalization. The specific validation and test periods for Panghai and Xinghua stations are shown in
Table 4.
The training process follows a conventional supervised learning framework. In each iteration, the input sequence is fed forward through the network to generate predicted outputs. The difference between the predicted and observed values is computed, and the MSE is used as the loss function to evaluate prediction accuracy. The backpropagation algorithm is employed to propagate the error gradient from the output layer to the input layer. Through the chain rule, the partial derivatives of the loss function with respect to each model parameter are calculated. The Adam optimizer is then used to update the model weights based on the calculated gradients and a predefined learning rate. Training continues until the model’s performance on the validation set ceases to improve or the maximum number of training epochs is reached.
To evaluate the predictive performance of TAM-xLSTM in capturing fine-scale hydrological processes, such as short-term fluctuations and rapid changes in flow, this study selects two representative hydrological stations as key testing sites: Panghai Station in the Qingshui River Basin and Xinghua Station in the Duliujiang River Basin. Experiments are conducted under both wet season and dry season conditions, and the results are compared with those of several mainstream baseline models. The statistical properties of the hydrometeorological input variables of river flow, rainfall, and solar radiation at Panghai and Xinghua stations are summarized in
Table 5. Detailed experimental results and performance comparisons are presented in
Table 6.
Table 5 reports river flow (m
3/s), hourly rainfall (mm), and solar radiation (W/m
2) for Panghai on the Qingshui River and Xinghua on the Duliu River. The mean flow at Xinghua is 34.5 m
3/s, about half of Panghai’s 76.1 m
3/s. The maximum flow at Xinghua reaches 2192.6 m
3/s, well above Panghai’s 1222.9 m
3/s. The standard deviations are similar, at 90.9 and 97.9 m
3/s for Xinghua and Panghai, respectively. The flow distribution at Xinghua exhibits markedly heavier tails, with skewness 11.6 and kurtosis 189.1, compared with 4.4 and 28.3 at Panghai, indicating a greater propensity for extreme floods. For rainfall, the mean hourly precipitation is 1.7 mm at Panghai and 1.4 mm at Xinghua. Panghai records a larger maximum rainfall of 101.2 mm, while Xinghua reaches 91.5 mm. Rainfall variability is high at both stations, but it is slightly more intermittent and bursty at Xinghua, with skewness 6.8 and kurtosis 60.1, compared with 5.9 and 45.7 at Panghai. Solar radiation is more stable than both flow and rainfall. Panghai shows a higher mean of 1334.8 W/m
2 and a higher maximum of 6548.4 W/m
2, while Xinghua records 903.4 and 4484.7 W/m
2. Distributions at both stations are near symmetric and light-tailed, with skewness close to 0.9 and kurtosis around −0.5 to −0.6.
The experimental results shown in
Table 6 indicate that the Transformer and Informer models generally underperform compared with the LSTM and GRU models across most evaluation metrics. This difference is particularly evident in the RMSE and MAE values during the wet season, which may be attributed to the stronger local temporal dependencies in the data and the requirement for Transformers to have larger datasets to effectively capture long-term dependencies [
25]. While LSTM and GRU models outperform Transformer and Informer in most indicators, they are surpassed by both the xLSTM and the proposed TAM-xLSTM models. This suggests that traditional recurrent architectures retain certain advantages in river flow forecasting. In the prediction tasks for both wet and dry seasons at Panghai and Xinghua stations, the TAM-xLSTM model achieves the best performance, demonstrating significantly lower RMSE and MAE values than other models. This reflects its superior generalization capability and higher prediction accuracy. Furthermore, TAM-xLSTM attains the lowest Theil’s U statistic, indicating smaller deviations between predicted and observed values as well as greater stability. It also achieves the highest NSE, signifying the most accurate fit to observed data among all models tested. In summary, TAM-xLSTM effectively captures the highly volatile river flow dynamics characteristic of the wet season, exhibiting the strongest robustness and precision while maintaining stable performance under complex hydrological conditions. Even during the dry season, characterized by weaker flow fluctuations, TAM-xLSTM continues to demonstrate strong generalization and fitting ability. These results confirm that the model possesses enhanced accuracy for hourly river flow prediction and excels in capturing dynamic hydrological variations compared with traditional and mainstream deep learning models.
During the wet season, river flow typically exhibits rapid changes events and large discharge volumes. A comparison of the model prediction results presented in
Figure 9 and
Figure 10 reveals notable differences in the performance of various models in river flow forecasting. The LSTM model effectively captures the long-term dependencies within the time series. However, it exhibits relatively weaker performance in capturing abrupt variations in flow and in filtering high-frequency noise [
26]. The Informer model surpasses both LSTM and Transformer in long-sequence prediction tasks but remains inferior to TAM-xLSTM in terms of prediction accuracy and trend fitting. Among all models, TAM-xLSTM demonstrates the highest sensitivity to dynamic variations in river flow time series, particularly excelling in the accurate prediction of abrupt variations in flow. This superior performance highlights the model’s robustness and reliability in forecasting during flood seasons.
In Guizhou Province, the presence of large dams imparts a distinct seasonal pattern to water resource regulation. Beginning in November each year, marking the start of the dry season, dams typically initiate water storage for a period before releasing it in a concentrated manner to satisfy hydropower generation demands. Consequently, river flow during the dry season generally exhibits relative stability with a narrower fluctuation range. A comparison of the model prediction results illustrated in
Figure 11 and
Figure 12 reveals significant differences in model performance for dry season river flow forecasting. The Transformer model is more effective at capturing the stable flow trends but lacks sensitivity to low-frequency variations, leading to considerable prediction errors for minor fluctuations [
27]. The Informer model improves upon LSTM in representing the stable characteristics of the dry season flow sequence but exhibits delayed responses to sudden rainfall events [
28]. The TAM-xLSTM model not only accurately captures the long-term attenuation patterns of river flow but also effectively detects subtle fluctuations during the dry season. These results fully demonstrate its robustness and marked advantages in dry season forecasting.
In addition to the deterministic metrics RMSE, MAE, Theil’s U, and NSE, this study further carried out an uncertainty analysis using a bootstrap procedure to estimate 95% confidence intervals for each model and for each season–station combination [
29], as presented in
Table 7 and
Figure 13. The inclusion of confidence intervals makes it possible to evaluate not only the average prediction error but also the robustness of the models under repeated sampling. The results indicate that the proposed TAM-xLSTM model consistently achieves the lowest RMSE and MAE values across both Panghai and Xinghua stations in wet and dry seasons. At the same time, its confidence intervals are narrower than those of the other models, which shows that the forecasts are not only more accurate but also more stable. For instance, at Panghai Station in the wet season, the RMSE of TAM-xLSTM is 16.95 m
3/s, substantially lower than the values of 24.36 m
3/s for LSTM and 25.13 m
3/s for Transformer, and accompanied by a much tighter confidence interval. At Xinghua Station, TAM-xLSTM again delivers the smallest errors, with an RMSE of 2.91 m
3/s in the wet season and 1.85 m
3/s in the dry season, both with highly consistent confidence intervals. By contrast, conventional deep learning models such as Transformer and Informer tend to produce larger RMSE and MAE values together with wider confidence intervals, which reflects less stable performance. The error bar plots in
Figure 13 further highlight these differences: TAM-xLSTM consistently appears in the lowest error region while also exhibiting the smallest variability across resamples. Overall, the addition of confidence intervals provides a more comprehensive and reliable comparison of model performance. This analysis clearly demonstrates the superiority and robustness of the proposed TAM-xLSTM framework over the baseline models.
To further assess the robustness and generalization ability of the proposed TAM-xLSTM model under a broader range of hydrological conditions, an additional experiment was conducted using long-term daily data from the Zhijin hydrological station, located in Bijie City, Guizhou Province, China. This dataset spans from 1975 to 2023 and includes daily precipitation, temperature, wind speed, air pressure, and water level measurements. Compared with the main hourly-scale dataset from the six stations in Qiandongnan region, the Zhijin dataset features a much longer temporal coverage but at a coarser temporal resolution. For this experiment, records from 1975 to 2010 were used for model training, 2011 to 2015 were allocated for validation, and 2016 to 2023 were reserved for testing. The same set of evaluation metrics—RMSE, MAE, Theil’s U, and NSE—was employed to ensure comparability. This supplementary analysis aims to verify whether the model can maintain its predictive performance when applied to datasets with different temporal resolutions and longer-term hydrological variability.
Table 8 presents the comparative results of different models for the Zhijin Station dataset. On daily test data from Zhijin Station from 2016 to 2023, the TAM-xLSTM model achieved the best performance across all evaluation metrics. In terms of RMSE, TAM-xLSTM achieved 2.3380 m
3/s, a 15.6% improvement over the next best performing Informer, significantly improving prediction accuracy. In terms of MAE, the TAM-xLSTM model achieved the lowest error of 1.0128 m
3/s, demonstrating its ability to capture daily flow variations with less bias. In terms of Theil’s U, the TAM-xLSTM model achieved a value of 0.2574, significantly lower than Transformer and GRU, indicating a further reduction in the relative error between predictions and observations. In terms of NSE, TAM-xLSTM achieved 0.6393, an improvement of over 56% over the other models, demonstrating its robustness and generalization capabilities under long-term and multi-year climate conditions. In summary, despite differences in temporal resolution and feature composition between the Zhijin Station data and the main experiment, TAM-xLSTM maintained optimal performance, validating the model’s adaptability across different time scales and hydrological conditions. This result provides strong evidence for the model’s application in hydrological forecasting tasks at multiple spatiotemporal resolutions.
As shown in
Figure 14, all deep learning models are able to effectively capture the overall trend of flow forecasting at Zhijin Station, but their performance varies during flood peak forecasting. TAM-xLSTM’s predicted values are closer to the measured values during most flood peak periods, demonstrating stronger peak response capabilities. Traditional LSTM, GRU, and Transformer models exhibit a certain degree of underestimation or overestimation at some flood peaks. Informer models track fluctuations well during the dry season but lag slightly during flood season. A zoomed-in image (b) further reveals the detailed performance of the models during a typical flood peak. The amplitude and phase of TAM-xLSTM and LSTM at multiple peak points are highly consistent with the measured traffic. Although GRU and xLSTM can track the trend, there are varying degrees of amplitude deviation at the peak. Transformer and Informer models accurately depict flood peak morphology, but their peak height predictions are slightly lower than the measured values. Overall, TAM-xLSTM achieves an optimal balance between flood peak capture accuracy and dry season fitting stability, demonstrating strong generalization ability.
5. Discussion
This study demonstrates the effectiveness of the proposed TAM-xLSTM framework for hourly river flow forecasting in small- and medium-sized catchments. Across different stations and hydrological regimes, TAM-xLSTM consistently outperformed traditional LSTM, GRU, Transformer, and Informer baselines. The improvement was particularly evident during periods of rapid hydrological change, such as flood peaks and sharp recessions, underscoring the model’s ability to capture short-term fluctuations and complex nonlinear dependencies that conventional approaches often fail to represent. Several structural innovations contribute to the enhanced performance of TAM-xLSTM. The T-WaveNet module expands the temporal receptive field through dilated convolutions, enabling effective extraction of multi-scale dependencies without introducing spatial noise. In parallel, the CBAM-1D mechanism adaptively allocates attention across both channels and timesteps, strengthening the model’s sensitivity to critical temporal features such as abrupt rises or drops in flow. These enhancements allow TAM-xLSTM to achieve fine-scale hydrological responsiveness while maintaining stability in long sequences, thereby addressing the limitations of both recurrent and Transformer-based architectures in small catchments.
The findings of this study are consistent with, and extend, previous research on data-driven streamflow forecasting. Earlier works using LSTM and GRU demonstrated the potential of recurrent neural networks to capture temporal dependencies, but their performance often deteriorated during highly nonlinear flood events [
30]. Transformer-based methods such as Informer and Autoformer have improved long-sequence forecasting efficiency, yet their reliance on large datasets and high computational demand limits their suitability for small catchments. By contrast, the modular xLSTM architecture was recently introduced to enhance memory capacity and temporal modeling. The present study advances this line of research by incorporating dilated convolution and attention modules, showing that such hybrid designs are particularly effective in hydrological contexts characterized by rapid and localized variability. Beyond accuracy gains, TAM-xLSTM also exhibits narrower confidence intervals than baseline models, as demonstrated by bootstrap resampling. This reflects not only reduced prediction errors but also greater robustness, a feature of critical importance in operational forecasting where uncertainty can be as consequential as the mean estimate. The combination of accuracy and robustness makes TAM-xLSTM especially valuable for real-time flood forecasting, where both false alarms and missed warnings carry substantial risks.
Although this study focuses on the Qiandongnan region, TAM-xLSTM is essentially a data-driven framework that can be applied to watersheds with different climatic and topographic conditions. When transferred to other basins, the model can be retrained or fine-tuned using local meteorological and hydrological data to capture region-specific flow dynamics. The modular design of TAM-xLSTM further supports its adaptability to diverse temporal patterns and hydrological responses, reinforcing its potential for broader transferability.
For practical deployment, TAM-xLSTM can be integrated into real-time flood warning systems by linking with automated meteorological and hydrological monitoring platforms. Its relatively low inference cost allows hourly forecasts to be generated on standard computational infrastructure or cloud servers. When predicted flows exceed predefined thresholds, the system can automatically issue flood alerts to local water resource authorities. Conversely, forecasts below critical levels may trigger reservoir filling operations for hydropower generation. Beyond technical deployment, the model has direct policy relevance: more reliable and timely forecasts can strengthen disaster risk reduction strategies, improve emergency preparedness, and guide reservoir operation policies on water allocation, hydropower scheduling, and ecological flow regulation.
Nevertheless, several limitations should be acknowledged. First, this study focused on short-term, hourly forecasting using high-frequency meteorological and hydrological data and did not include comparisons with physically based hydrological models such as SWAT or HEC-HMS. These models are designed for long-term simulations and require extensive calibration, which was beyond the scope of this work. Future research could explore hybrid approaches that integrate physical and data-driven models to leverage their respective strengths. Second, although TAM-xLSTM performed well at multiple basins in Guizhou Province, the validation is geographically limited and does not cover diverse climatic zones or extreme real-time flood events. Broader evaluation across different regions and climates is needed to confirm generalizability. Third, flows at some stations are strongly influenced by reservoir operations. Due to the lack of detailed dam operation records, such effects were not explicitly incorporated, which limits applicability in regulated basins. Future work should attempt to integrate dam operation data to enhance predictive accuracy and reliability in such contexts.