Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability

Ma, Kaiwen; Jiang, Changbo; Long, Yuannan; Wu, Zhiyuan; Yan, Shixiong

doi:10.3390/w18050601

Open AccessArticle

Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability

by

Kaiwen Ma

^1,2

,

Changbo Jiang

^1,2,3,*,

Yuannan Long

^1,2,3,

Zhiyuan Wu

^1,2,3

and

Shixiong Yan

^1,2,3

¹

School of Hydraulic and Ocean Engineering, Changsha University of Science & Technology, Changsha 410114, China

²

Hunan Provincial Key Laboratory of Water and Sediment Science and Water Disaster Prevention, Changsha 410114, China

³

Hunan Provincial Key Laboratory of Dongting Lake Water Environment Management and Ecological Restoration, Changsha 410114, China

^*

Author to whom correspondence should be addressed.

Water 2026, 18(5), 601; https://doi.org/10.3390/w18050601

Submission received: 5 January 2026 / Revised: 19 February 2026 / Accepted: 24 February 2026 / Published: 2 March 2026

(This article belongs to the Special Issue Application of Machine Learning in Hydrological Monitoring)

Download

Browse Figures

Versions Notes

Abstract

To address the limitations of traditional runoff prediction methods—namely, the oversimplification of meteorological factor selection, ambiguous interactions among core variables, and the disruptive influence of redundant inputs—this study focuses on the Zijiang River Basin as a representative case. A suite of machine learning models, including Long Short-Term Memory Neural Network (LSTM), Convolutional Neural Network (CNN)-LSTM, Temporal Convolutional Network (TCN), and Gradient Boosting Regression Tree (GBRT), was constructed and trained using 13 distinct combinations of meteorological variables. These configurations were systematically evaluated to assess their compatibility with each model in simulating daily runoff patterns. Additionally, the Shapley Additive Explanations (SHAP) algorithm was employed to quantitatively assess the contribution of each factor to predictive accuracy. Among the models tested, the TCN model consistently demonstrated superior performance, particularly in mitigating the effects of irrelevant or redundant features. The GBRT model showed distinctive strengths in accurately predicting peak flow timings. Of all input configurations, the combination of “runoff + precipitation + evaporation + temperature” emerged as the most effective. Findings indicate that the predictive value of individual meteorological variables hinges primarily on their direct correlation with runoff, while the effectiveness of multi-factor schemes depends on the degree of functional integration—specifically, the coupling of hydrological recharge, consumption, and regulatory processes. The presence of redundant variables was found to impair model performance unless they contributed to a meaningful synergistic relationship with core inputs. The SHAP analysis further reinforced these insights: precipitation-related variables proved to be the most critical to prediction accuracy, whereas temperature and evaporation served more complementary roles. Notably, the inclusion of relative humidity tended to suppress runoff responses and increased deviation in peak timing estimates. These findings shed light on the nuanced interplay between meteorological input design and model selection, offering a robust foundation for optimizing data-driven runoff prediction frameworks.

Keywords:

machine learning models; Zijiang River Basin; daily runoff simulation; meteorological factors; SHAP algorithm

1. Introduction

Accurate runoff prediction serves as a fundamental pillar for flood mitigation, water resources allocation, and holistic watershed governance, directly influencing the soundness of hydrological decision-making [1,2,3]. The relationship between runoff and meteorological variables is inherently nonlinear and complex [4,5]. Traditional hydrological models (e.g., Soil and Water Assessment Tool (SWAT) [6,7] and Hydrologic Engineering Center-River Analysis System (HEC-RAS) [8]) often rely on generalized physical formulations that limit their ability to represent these intricate interactions. Such limitations can lead to substantial simulation errors, occasionally surpassing 30% during extreme rainfall scenarios [9,10,11,12,13]. In contrast, data-driven models equipped with machine learning frameworks have shown remarkable potential in capturing nonlinear dependencies, especially as advances in remote sensing have expanded the range of meteorological inputs from a conventional 5~8 variables to over 20 [14,15]. While this expansion increases the diversity and richness of training data, it also heightens the risk of overfitting when irrelevant or redundant features are included. On the other hand, focusing exclusively on precipitation may result in the loss of over 70% of valuable synergistic information among variables [16,17]. Runoff prediction in the Zijiang River Basin is further challenged by spatial and temporal heterogeneity in precipitation, ambiguous collaborative effects among meteorological inputs, and the difficulty of forecasting extreme flow events. Consequently, the thoughtful selection of meteorological factors and modeling techniques is critical to improving the predictive accuracy of data-driven approaches [18,19].

Prior research has extensively explored factor selection, model adaptability, and interpretability, offering important groundwork for this study [20,21]. Nevertheless, several limitations persist. Chen et al. [22] selected input variables—such as precipitation and temperature—primarily based on expert judgment, a method susceptible to cognitive bias and inconsistency, potentially overlooking important but non-obvious predictors. Furthermore, this approach lacks a standardized quantitative framework, often yielding divergent outcomes across experts. Huang et al. [23] employed ensemble machine learning models to evaluate different combinations of meteorological variables, finding that core-factor sets outperformed all-inclusive schemes by avoiding overfitting while preserving essential hydrological signals. Zhu et al. [24] examined the Songhua River Basin using an SVR-based multi-factor scheme, confirming both linear and nonlinear synergistic roles of precipitation and evaporation as part of a runoff “recharge-consumption” system. Liang et al. [25] emphasized that incorporating too many weakly correlated variables could introduce noise, reducing model robustness—a finding that underscores the necessity of dimensionality control. Zhang et al. [26] proposed a statistical forecasting framework combining “main factors + auxiliary factors”, highlighting the complementary effects of primary and secondary inputs and forming the conceptual foundation of the main-auxiliary synergy approach. Wang et al. [27] verified that different prediction models exhibit distinct responses to varying factor combinations in evapotranspiration interpolation on the Qinghai–Tibet Plateau (QTP). Xie et al. [28] used a Convolutional Neural Network + Long Short-Term Memory Neural Network (CNN-LSTM) structure integrating multi-scale meteorological inputs to predict subseasonal temperature and heatwaves in China, underscoring machine learning models’ adaptability to diverse input types. Despite these developments, the interpretability of machine learning remains limited. As inherently black-box systems, they struggle to transparently assess the contribution or interaction of input variables. Li et al. [29] introduced a SHapley Additive exPlanations (SHAP)-based feature selection strategy, effectively quantifying the influence of each feature and removing redundancy. Fan et al. [30] further utilized “neural network + SHAP” framework to explore how meteorological variables impact precipitation intensity, while Ma et al. [31] combined SHAP with gradient boosting to identify key drivers of summer rainfall in Xinjiang. Even so, many studies adopt oversimplified input combinations, often limited to just 2 or 3 conventional variables (e.g., precipitation + temperature), and lack robust frameworks for filtering weak variables or analyzing model-factor compatibility across structures. Current research is also deficient in comprehensive cross-model comparisons and deep integration of hydrological process knowledge.

To bridge these gaps, this study investigates the Zijiang River Basin by constructing four machine learning models—LSTM, CNN-LSTM, Temporal Convolutional Network (TCN), and Gradient Boosting Regression Tree (GBRT). Based on the principle of main-auxiliary factor synergy, 13 distinct meteorological input combinations are formulated. These schemes are used to simulate daily runoff and assess predictive accuracy. Moreover, SHAP analysis is employed to interpret the contribution of each input variable, with the goal of identifying the optimal dimensionality and uncovering the interaction patterns between input configurations and model performance.

2. Study Area and Data Processing

2.1. Study Area

The Zijiang River Basin lies in the central region of Hunan Province, spanning latitudes 25°48′ N to 28°61′ N and longitudes 110°12′ E to 112°10′ E (Figure 1). The main river stretches for approximately 653 km, with an average annual discharge of around 760 m³·s⁻¹ and a total yearly runoff volume reaching 24 billion m³. Encompassing a drainage area of 28,100 km², the basin experiences an average temperature of roughly 20 °C and receives a mean annual precipitation of 1483.3 mm [32]. However, influenced by atmospheric circulation patterns and regional climatic variability, precipitation across the basin exhibits significant spatial and temporal disparities. Annual rainfall in drier zones drops to around 1200 mm, whereas the middle and lower reaches typically receive between 1500 and 1800 mm. Moreover, the region’s rainfall is markedly seasonal, with the bulk of precipitation occurring between April and July [33]. Approximately 88.56% of the Zijiang River Basin is composed of haplic acrisols, humic acrisols, cumulic anthrosols, and haplic alisols. The basin’s land use is classified into six main categories: cropland, forest land, grassland, water bodies, construction land, and unused land. Vegetative cover is notably extensive, with forested and cultivated areas together occupying more than 90% of the basin’s total surface. Grasslands, which account for roughly 3.28%, are primarily concentrated along the basin’s southern highland margins. Built-up areas are largely situated in the central and eastern zones, where the topography is relatively level and more favorable for development.

2.2. Data Processing

This study focuses on the Zijiang River Basin as the primary area of investigation. Hydrological data were collected from two key hydrological stations within the basin: the Taojiang Station, located at the basin’s downstream outlet, provided records spanning from 1971 to 2020, while the Shaoyang Station contributed data covering the period from 2001 to 2020. These data were obtained from the Hunan Provincial Hydrology and Water Resources Survey Center. Corresponding meteorological data were sourced from 17 national-level meteorological stations distributed within and around the basin, based on daily climate observations for the same period (1971–2020). The data were accessed through the China National Meteorological Data Center (http://data.cma.cn/data/, accessed on 18 June 2025), specifically from the China Surface Climate Data Daily Values Dataset (V3.0), and include 19 variables such as precipitation, temperature, evaporation, relative humidity, wind speed and direction, air pressure, ground surface temperature (0 cm), and sunlight hours [34,35,36]. Land use data were sourced from the Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences. Soil information was derived from the Harmonized World Soil Database (HWSD), which offers detailed data on soil spatial distribution and physical characteristics at a resolution of 1 × 1 km [33].

To ensure robust model training, parameter tuning, and performance validation, the entire dataset was partitioned into training and testing subsets following an 8:2 ratio. During the preprocessing stage, observed hourly runoff and meteorological records from the Zijiang River Basin were extracted according to the requirements of each experimental scenario. Short-term gaps in the time series (i.e., those lasting three days or fewer) were addressed using linear interpolation, whereas longer missing intervals (exceeding 3 days) were filled via spline interpolation. Taking the 1971–2020 period as the climatological reference, meteorological variables at all stations were systematically bias-corrected using the quantile mapping technique. All predictive variables were linearly normalized to the [−1, 1] range according to Equation (1), and additional reshaping procedures, such as data flattening, were applied to reformat the dataset for model input. To generate sequences suitable for daily runoff forecasting, the direct forecasting method was adopted to generate runoff prediction samples. This process required identifying the optimal lag length for each input feature to capture the temporal dependencies between runoff at time t and preceding observations. The Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) were utilized to determine suitable lag scales for both hydrological and meteorological variables. Based on these lag selections, a sliding window technique was applied to systematically generate the training samples. Specifically, each sample was constructed using 24 h of hourly runoff data, combined with seven consecutive days of daily precipitation and other meteorological variables, depending on the scenario. This dual-branch structure enabled the model to generate predictions for runoff 24 h in advance.

\tilde{x} = 2 \times \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(1)

where

\tilde{x}

represents the normalized result;

x

represents the initial value;

x_{\min}

represents the minimum value of the sequence data; and

x_{\max}

represents the maximum value of the sequence data.

3. Research Methodology

3.1. Machine Learning Model Structure

In this study, four machine learning structures are employed to perform daily runoff forecasting: LSTM, CNN-LSTM, TCN, and GBRT. All experiments were conducted using Python 3.7. Data preprocessing was performed with the NumPy (version 1.21.6) and pandas libraries. The dual-branch model architectures for runoff prediction were implemented in PyTorch (version 1.12.1), while hyperparameter tuning was carried out using the Optuna (version 3.1.1). The LSTM model incorporates a memory cell mechanism that enables the retention of long-term dependencies, making it well-suited for modeling sequential hydrological processes. CNNs, known for their ability to extract local spatial patterns, are integrated with LSTM in the CNN-LSTM model to capture both spatial and temporal dynamics concurrently. Unlike recurrent structures, the TCN model leverages dilated and causal convolutions to process time series data, offering enhanced stability and scalability for capturing long-range temporal patterns—an essential characteristic in hydrological forecasting. GBRT, as an ensemble learning method, incrementally builds a strong predictor by aggregating multiple weak decision trees, thereby enhancing overall predictive accuracy. These four models, while differing in structure and learning mechanisms, exhibit complementary strengths. Their comparative analysis not only facilitates a nuanced understanding of performance across varying meteorological input conditions but also informs the selection of the most appropriate model for runoff prediction in subsequent stages of this research. The machine learning model framework for daily runoff prediction is shown in Figure 2.

3.1.1. LSTM

LSTM networks represent a specialized class of Recurrent Neural Networks (RNNs) designed to address the vanishing gradient problem and to capture long-range dependencies in sequential data [37,38]. In the context of the Zijiang River Basin, runoff processes display periodic variability, with accumulated precipitation and evaporation exerting lasting impacts on subsequent hydrological responses. Such temporal characteristics make LSTM particularly well-suited for this task, as its structure—comprising input, forget, and output gates—enables dynamic information retention and selective memory updates across time steps. To accommodate the temporal resolution of the dataset and the need for multi-scale feature fusion, several experiments were conducted to optimize input sequence lengths. The input sequence lengths were set to 7 days for daily data and 24 h for hourly data, with a forecasting lead time of 30 h. A three-layer dual-branch network was constructed to extract features independently from each timescale. These parallel feature streams were subsequently concatenated and passed through a fully connected layer with a GELU activation function to produce the final runoff prediction. This structure enhances the model’s ability to capture both short- and long-term temporal patterns, improving its generalization across different hydrological regimes. Key hyperparameters were configured as follows: a 3-layer stacked architecture was adopted, with each layer comprising 256 hidden units. A dropout rate of 0.2 was applied for regularization. A bidirectional LSTM was employed, and its outputs were projected back to a 256-dimensional representation via a fully connected layer. Following the concatenation of features from the dual-branch structure, a ReLU-activated fully connected layer was used to map the output to a single-dimensional runoff prediction. During training, the batch size was set to 64, and the initial learning rate was 5 × 10⁻⁵. The AdamW optimizer was used, incorporating a weight decay of 5 × 10⁻⁵, and the loss function was defined as Mean Squared Error Loss (MSELoss). To improve model generalization, the training data were shuffled during loading. Upon completion of training, the model yielding the lowest validation loss was selected for evaluation on the test set. The model was trained over a maximum of 500 epochs, with early stopping triggered if validation loss failed to improve over 50 consecutive iterations. The overall structure is illustrated in Figure 3.

3.1.2. CNN-LSTM

The CNN-LSTM model used in this study integrates CNN with LSTM networks to enhance runoff prediction performance by simultaneously capturing spatial and temporal patterns. This hybrid structure leverages the local feature extraction capability of CNNs and the sequential learning strength of LSTM [39]. The Zijiang River Basin, characterized by pronounced topographic variation, exhibits significant spatial heterogeneity in meteorological conditions such as precipitation and temperature. These upstream-downstream discrepancies contribute to localized runoff behavior. To account for such spatial and temporal complexity, a dual-branch architecture was adopted to extract temporal features at both daily and hourly scales, with a forecasting lead time of 30 h. Each branch includes two 1D convolutional layers, which are used to automatically detect critical spatial structures in multi-site meteorological data, such as rainfall intensity peaks or sudden temperature shifts. The convolutional outputs are passed through Rectified Linear Unit (ReLU) activation and max pooling layers, compressing the temporal dimension from 24 to 6 while expanding feature channels from 2 to 32. The resulting spatial representations are then forwarded to an LSTM layer, which models the sequential dependencies across time. The final output from the LSTM’s last time step is passed through a fully connected layer to generate a single runoff prediction value—effectively combining spatial encoding with temporal inference. Model parameters were tuned as follows: the time steps were set to 12 and 24; the number of filters was 16 and 32; convolution kernels were 3 × 3 in size; ReLU was adopted as the CNN activation function; the pooling layer employed a 2 × 2 window; the LSTM contained 1 layer with 64 units; Adam was selected as the optimizer, and the loss function was defined as MSELoss. During training, both training and validation losses were monitored continuously. The model with the lowest test loss was preserved. The training process was configured with a batch size of 64 and an initial learning rate of 1 × 10⁻⁴. The maximum number of training epochs was set to 500. An early stopping strategy was implemented, whereby training was halted if the validation loss failed to improve for 30 consecutive epochs. The overall structure is illustrated in Figure 4.

3.1.3. GBRT

The GBRT algorithm, originally proposed by Friedman, combines the gradient boosting framework with decision tree regression techniques to form an ensemble model composed of multiple weak learners. By iteratively minimizing a predefined loss function using gradient descent, each new tree is trained to correct the residual errors of its predecessors, thereby progressively enhancing the model’s predictive accuracy [40,41]. In this study, GBRT is applied to runoff forecasting in the Zijiang River Basin, which exhibits the typical hydrological behavior of mountainous rivers—characterized by sharp fluctuations in flow and high peak discharge rates. Accurately predicting the timing of peak flows during extreme flood events is particularly challenging, yet crucial for effective flood risk management and scheduling of emergency responses [42,43]. To accommodate the temporal characteristics of the dataset, extensive experimentation was conducted to determine appropriate input sequence lengths. The optimal configuration sets the daily sequence length to 7 days and the hourly sequence to 24 h. These sequences were flattened and concatenated to produce high-dimensional input vectors, integrating multi-scale meteorological and hydrological information with their corresponding runoff targets. Hyperparameter tuning was primarily conducted via randomized search to identify the optimal GBRT configuration. The search space included: maximum tree depth ranging from 1 to 25; learning rates from 1 × 10⁻⁴ to 1; minimum samples per leaf node between 1 and 50; and minimum samples required for a split ranging from 2 to 100. The GBR was employed as the base estimator, with the random seed fixed at 42. To prevent information leakage in the time-series data, 5-fold time series cross-validation (TimeSeriesSplit) was utilized. Hyperparameter tuning was performed via randomized search over 20 iterations, with parallel processing enabled (n_jobs = −1) and verbose output (verbose = 2). Following model fitting on the training set, the optimal combination of hyperparameters was selected to construct the final GBRT model.

3.1.4. TCN

The TCN enhances conventional CNNs by introducing key architectural components such as residual connections, dilated convolutions, and causal convolutions. These improvements provide more stable gradient propagation and flexible receptive fields, effectively mitigating issues like vanishing or exploding gradients during training [44,45]. In the Zijiang River Basin, runoff processes are highly sensitive to intense, short-duration precipitation events, often leading to abrupt hydrological changes. Traditional models frequently struggle to balance robustness against noise and responsiveness to such sudden shifts. To address these challenges, the TCN model in this study employs a dual-branch structure with a 3-level residual convolution module (TCNBlock) for multi-scale feature extraction; the daily input sequence length was set to 7 days, while the hourly input sequence length was configured as 24 h. Through extensive experimentation, the optimal input sequence lengths were set to 7 days for daily data and 24 h for hourly data. Both branches include two layers of 1D convolutions with a kernel size of 3. The number of channels followed an increasing configuration [32,64,128]. The use of dilated convolutions allows the model to capture long-range dependencies, while residual links enhance training stability. Following the convolutional layers, global average pooling is applied to each branch, and the resulting features are concatenated. This combined representation is then passed through a fully connected layer to produce the final runoff prediction. This structure enables the model to effectively integrate short-term fluctuations and long-term temporal trends. The key training parameters are as follows: timesteps were set to 12 and 24; the Adam optimizer was adopted; the dropout rate was configured at 0.05; the dilation rate in the residual blocks increases exponentially in powers of 2; and the loss function used was MSELoss. The model was trained using a batch size of 64, an initial learning rate of 1 × 10⁻³, and a weight decay coefficient of 1 × 10⁻⁴. A cosine annealing learning rate scheduler with warm restarts was employed to dynamically adjust the learning rate during training. To improve generalization, data shuffling was applied when loading the training set. Upon completion of training, the model exhibiting the lowest validation loss was selected for prediction on the test set. The model was trained for up to 500 epochs, with early stopping activated if validation loss failed to improve over 80 consecutive iterations. The full structure is illustrated in Figure 5.

3.2. SHAP Model Interpretation

The SHAP framework is grounded in cooperative game theory and utilizes Shapley values to quantify the contribution of each input feature to a model’s output prediction. Specifically, it measures the average marginal contribution of a feature by considering all possible permutations in which the feature can be added to the model [46,47]. In this study, the Extreme Gradient Boosting (XGBoost, version 1.21.6) model was utilized for regression tasks. The core parameters were configured as follows: the number of decision trees (n_estimators) was set to 100, and the random seed (random_state) was fixed at 42 to ensure reproducibility. All other hyperparameters were maintained at their default settings to enable SHAP-based feature importance analysis under baseline conditions. This allows for a comprehensive assessment of feature importance. For each prediction, SHAP assigns a value to every feature, representing its individualized impact on the model’s output. The formal computation of a feature’s SHAP value is expressed in Equation (2):

\emptyset_{i} (f) = \sum_{s \subseteq N {i}} \frac{|s|! (|N| - |S| - 1)!}{|N|!} [f (S \cup \{i\}) - f (S)]

(2)

where

N

represents the set of all features;

S

represents a subset of features, indicating the feature set excluding feature

x_{i}

;

f (S)

represents the model output when predicting using the subset

S

;

f (S \cup \{i\})

represents the model output when feature

x_{i}

is added to the subset

S

; and

\frac{|s|! (|N| - |S| - 1)!}{|N|!}

represents the weight of the SHAP value.

3.3. Model Accuracy Evaluation Metrics

The predictive performance of the models is assessed using four statistical metrics: Nash-Sutcliffe Efficiency Coefficient (NSE), Normalized Root Mean Square Error (NRMSE), Mean Absolute Percentage Error (MAPE), and Peak Percentage of Threshold Statistic (PPTS). These metrics collectively evaluate both the accuracy and robustness of the runoff simulations across different aspects, including overall error, relative deviation, peak flow estimation, and hydrological model efficiency. The mathematical formulations corresponding to each evaluation criterion are presented in Equations (3)–(6):

NSE = 1 - \frac{\sum_{t = 1}^{N} {(y (t) - \hat{y} (t))}^{2}}{\sum_{t = 1}^{N} {(y (t) - \bar{y} (t))}^{2}}

(3)

NRMSE = \frac{\sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(y (t) - \hat{y} (t))}^{2}}}{\frac{1}{N} \sum_{t = 1}^{N} y (t)}

(4)

MAPE = \frac{1}{N} \sum_{t = 1}^{N} |\frac{y (t) - \hat{y} (t)}{y (t)}|

(5)

PPTS (γ) = \frac{100}{γ} \frac{1}{N} \sum_{t = 1}^{G} |\frac{y (t) - \hat{y} (t)}{y (t)} \times 100|

(6)

where

N

represents the sample size of the test set;

y (t)

represents the observed runoff value;

\hat{y} (t)

represents the predicted value;

\bar{y} (t)

represents the observed mean value;

γ

represents the percentage for evaluating peak prediction performance; and

G

represents the selected number of maximum peaks.

The NSE measures the proportion of variance in the observed data explained by the model predictions; values approaching 1 indicate superior overall predictive accuracy. The NRMSE scales the RMSE to facilitate comparison across different models or datasets, with lower values (closer to 0) denoting better performance. The MAPE provides a direct measure of prediction error, where smaller values reflect reduced deviation between predicted and observed values. The Peak Percentage Threshold Score (PPTS) assesses the model’s ability to capture extreme peak events; values closer to 0 indicate more accurate peak prediction performance.

The evaluation of extreme hydrological events must comprehensively account for model characteristics, input factor combinations, the suitability of performance metrics, and the underlying physical mechanisms. Accordingly, scenario-specific adjustments were made to selected evaluation metrics based on their inherent properties. For NSE, a weighted formulation was adopted to mitigate the influence of regular periods and to emphasize the model’s ability to capture extremes. Residual weights ranging from 1.5 to 2.0 were assigned to samples occurring within 3 to 7 days before and after flood peaks, thereby enhancing the sensitivity of the metric to peak events. For NRMSE, standardization was modified by replacing the mean with the observed range (“maximum-minimum”), avoiding the dampening effect that regular samples can impose on variability. MAPE was adjusted using a combined strategy of “log transformation + segmented evaluation”, allowing separate error assessment for extremely high and low flow conditions. The PPTS metric was refined by narrowing the peak identification window, thereby reducing errors associated with peak timing and magnitude. To ensure robust extreme event evaluation, extreme samples were identified using either the annual maximum method or a threshold exceedance approach, guaranteeing adequate representation of such events in the test set. Additional analyses included error distribution profiling within extreme intervals, comparative assessments across multiple models, and sensitivity analyses of key parameters.

4. Results and Analysis

4.1. Comparative Analysis of Model Accuracy

In this study, four models—LSTM, CNN-LSTM, TCN, and GBRT—were developed to conduct runoff simulations, using the Taojiang (Figure 6) and Shaoyang (Figure 7) hydrological stations as case studies. A series of meteorological factor combinations were designed as input conditions to evaluate the models’ predictive performance under varying scenarios. Quantitative assessments were carried out to examine the compatibility between different meteorological input configurations and model performance. This experimental framework was employed to validate the feasibility and effectiveness of the proposed methodology.

Drawing from previous research and the conceptual framework of synergy between main and auxiliary factors, this study classifies meteorological variables into baseline, core, and weak categories. Based on this classification, ten meteorological factor combination schemes are developed (Table 1). Scenarios 1–2 compare the runoff simulation accuracy of each model under full-factor and core-factor input configurations. Scenarios 3–7 are designed to examine the individual effects of core factors on simulation performance. Scenarios 8–11 incrementally introduce auxiliary factors, such as temperature and humidity, to verify the synergistic effect of multi-variable input schemes and explore the optimal dimensionality for integrating main and auxiliary factors. Scenarios 12–13 evaluate the interaction mechanisms between weak and core variables, aiming to quantify the influence of weak factor dimensionality on model prediction accuracy.

Using the Shaoyang Station as an example (Figure 7), there are notable variations in metric performance across different scenarios. Scenario 8 exhibits the largest coverage area in Figure 7a and the smallest in Figure 7b–d, achieving an NSE of 0.96 and a remarkably low NRMSE of 0.0169. In contrast, Scenario 1 presents the smallest coverage in Figure 7a and the largest in Figure 7b,c, with a substantially lower NSE value of 0.43 and an NRMSE of 0.0516. Scenario 13 stands out in Figure 7d, where it yields the highest PPTS value, reaching 84.96%.

A comprehensive evaluation based on multiple performance metrics indicates that the TCN model consistently outperforms the others in terms of NSE and NRMSE. It demonstrates superior overall predictive accuracy, effective error control, and strong resilience against the influence of redundant meteorological factors. The GBRT model excels in MAPE and PPTS metrics, offering enhanced control over percentage-based errors and superior accuracy in peak flow timing, making it particularly well-suited for identifying extreme hydrological events such as flash floods. In comparison, the LSTM and CNN-LSTM models show relatively weaker performance across several indicators and are more susceptible to the adverse effects of redundant input variables. In summary, the degree of interference from meteorological inputs across scenarios, ranked from highest to lowest, is as follows: Scenario 8 > Scenario 11 > Scenario 6 > Scenario 7 > Scenario 3 > Scenario 4 > Scenario 5 > Scenario 9 > Scenario 10 > Scenario 12 > Scenario 2 > Scenario 13 > Scenario 1. The ranking of model performance in terms of runoff prediction accuracy is: TCN > GBRT > CNN-LSTM ≈ LSTM. Given its superior results in daily runoff simulation for the Zijiang River Basin, the TCN model is selected for subsequent analyses.

4.2. Runoff Prediction Performance and Driving Mechanism Analysis Under Different Meteorological Factor Combinations

The above results validate the predictive accuracy of each model under 13 different meteorological factor combinations using a multi-metric evaluation approach (Figure 8 and Figure 9). Among the tested models, the TCN model consistently delivers the best overall performance. Furthermore, its accuracy is systematically influenced by the structure of the meteorological input, highlighting the importance of input variable selection. Taking the Taojiang Station as an example (Figure 8), Scenario 8 (“runoff + precipitation + evaporation + temperature”) demonstrates the highest consistency with observed values throughout the entire simulation period, indicating optimal input suitability. In contrast, Scenario 1, which relies solely on historical runoff, fails to capture abrupt variations, while Scenario 2 exhibits reduced accuracy due to redundancy introduced by the inclusion of all meteorological factors. Scenario 13 shows significant prediction deviations, likely caused by noise from weak feature interactions. In high-flow peak periods, Scenario 8 achieves the closest alignment with observed values, whereas Scenarios 1 and 2 yield substantial peak prediction errors, highlighting the superior ability of core factor combinations to capture extreme runoff events. In medium- and low-flow conditions, Scenarios 3–11 generally follow observed fluctuation trends, with Scenario 8 again providing the highest overlap. Conversely, Scenarios 12 and 13 deviate in certain time segments, suggesting that the inclusion of weakly correlated factors increases prediction error. Overall, the optimal meteorological input structure should follow a “core-dominant + moderately auxiliary” structure. Excessively complex combinations (Scenarios 2 and 13) tend to degrade model fitting and error control. Well-structured configurations such as “runoff + precipitation + evaporation + temperature” (Scenarios 3–8) significantly improve simulation accuracy. Adding relative humidity (Scenarios 9–12) introduces additional bias, while relying solely on runoff (Scenario 1) limits the model’s generalization capability. Scenario 8 consistently outperforms others across extreme event simulation, mid- to low-flow fitting, overall accuracy, bias control, and peak alignment, making it an effective and robust input strategy for deep learning-based runoff prediction.

Based on the fluctuations of performance metrics across scenarios in Table 2 and Table 3, and considering the logic of single-factor priority, multi-factor synergy, and redundant factor interference, this study analyzes the underlying mechanisms by which meteorological factor combinations influence runoff prediction accuracy. Scenario 1, which excludes all meteorological variables, is used as the baseline. Accuracy differences are then evaluated after the addition of individual meteorological factors. Among the three single-factor scenarios, Scenario 3 shows the most significant improvement: an 8.7% increase in NSE, reductions in NRMSE of 6% and 7.3%, decreases in MAPE of 3.3% and 6.1%, and declines in PPTS of 2.8% and 7.3% compared to Scenarios 4 and 5. The superior performance of Scenario 3 can be attributed to the direct relationship between precipitation and surface runoff, which enables the model to capture hydrological responses more effectively. In contrast, evaporation acts as a key driver of runoff attenuation by directly reflecting water loss during the recession phase, while temperature influences runoff indirectly through its modulation of evaporation intensity. However, due to its indirect role, temperature contributes less to prediction improvement than evaporation. These results indicate that the enhancement of model accuracy through single-factor inputs is not determined by the theoretical physical importance of a variable but by its degree of direct correlation with runoff. Factors with direct hydrological impact offer greater accuracy gains than those that play a regulatory role. The relative priority of single meteorological drivers in improving model performance can be ranked as: precipitation > evaporation > temperature > weakly correlated baseline variables.

Using the single-core-factor scenarios as a new baseline, the effect of introducing multi-factor combinations is further examined. Scenario 8 achieves 14.6% and 16.4% increases in NSE compared to Scenarios 6 and 7, decreases in NRMSE of 59.6% and 50%, reductions in MAPE of 26.6% and 25.9%, and declines in PPTS of 35.4% and 30.3%, respectively—making it the most effective combination. The advantage of Scenario 8 lies in its integrated coupling of water and energy variables: precipitation provides the source for runoff generation, evaporation accounts for water consumption, and temperature regulates the efficiency of coupling. Together, these three factors form a complete hydrological cycle—covering supply, consumption, and regulation—which significantly enhances model accuracy. While precipitation and evaporation help capture the runoff balance process, their inability to fully regulate runoff dynamics results in reduced accuracy in peak flow timing. Meanwhile, the pairing of precipitation with temperature offers a weaker synergistic effect due to the lack of direct interaction, which explains why Scenario 7 performs worse than Scenario 6. In summary, the benefits of multi-factor combinations are not additive but depend on whether the selected variables form a closed-loop system. Combinations that comprehensively address supply, consumption, and regulation yield markedly superior predictive accuracy and compensate for weaknesses in individual metrics, such as peak flow estimation.

Finally, to assess the impact of redundant or weak variables, core combinations are used as baselines. When comparing Scenario 9 to Scenario 6, the TCN model exhibits a 16.9% decrease in NSE, along with increases in NRMSE (52%), MAPE (42.2%), and PPTS (56.7%). Similarly, Scenario 10 shows a 20.7% drop in NSE and corresponding increases in NRMSE (44%), MAPE (49.5%), and PPTS (63.4%) relative to Scenario 7. In Scenario 11, the degradation is less severe: a 6.3% decrease in NSE, 45.5% increase in NRMSE, 15.9% rise in MAPE, and a 10.6% increase in PPTS compared to Scenario 8. While the accuracy of Scenario 10 is slightly better than that of Scenario 9, both scenarios exhibit more substantial performance degradation than Scenario 11. This variation can be explained by the strong correlation between relative humidity and both evaporation and temperature. In Scenarios 6 and 7, where such synergistic factors are absent, adding relative humidity introduces redundant information that exacerbates overfitting. In contrast, when relative humidity is combined with temperature (as in Scenario 11), it contributes to the estimation of actual evapotranspiration efficiency, resulting in only mild interference. In conclusion, the degree of interference introduced by redundant variables depends on their secondary synergistic relationships with core factors. When redundant inputs form refined functional linkages with core variables, they may partially offset their negative impact on model performance.

4.3. SHAP Model Interpretability Analysis

To accurately quantify the influence of meteorological factors on runoff simulation accuracy and to clarify their relative importance, this study employs the SHAP model. Using the Taojiang Station as a case study, the model’s predictions were decomposed into the individual contributions of each input variable, thereby revealing the contribution patterns of meteorological factors within the TCN model. As illustrated in Figure 10 and summarized in Table 4, precipitation-related variables emerge as the primary drivers. The SHAP values for daily precipitation are the most dispersed and have the highest absolute magnitude (7.8189), indicating a strong and abrupt positive influence on runoff. This variable also exhibits the widest value distribution range, further emphasizing its dominant role. Temperature exhibits a relatively broad SHAP value distribution as well, with most high-magnitude values corresponding to positive contributions (maximum value: 6.4520). In contrast, evaporation displays SHAP values with both positive and negative signs (−7.4823), reflecting a dual influence. This factor functions in coordination with temperature to regulate the water loss process, particularly during the recession phase of runoff. Relative humidity shows densely distributed and consistently negative SHAP values (−5.3683), positioning it as a key negative influence that inhibits runoff generation or amplification. Other variables, such as wind speed and atmospheric pressure, display tightly clustered SHAP values with small absolute magnitudes and limited variance. These results suggest that such variables exert minimal direct influence on runoff and serve only minor auxiliary functions. In summary, precipitation-related factors are identified as the core determinants of runoff dynamics. Temperature and evaporation act as indirect synergistic drivers, relative humidity functions as the principal suppressive factor, and all other meteorological variables provide only marginal supporting effects within the model.

5. Discussion

This study focused on the Zijiang River Basin and developed four machine learning models—LSTM, CNN-LSTM, TCN, and GBRT—guided by the principle of main–auxiliary factor synergy. Thirteen meteorological input combinations were designed and integrated with SHAP-based interpretability analysis to support both accurate runoff prediction and transparent model interpretation. The study achieved significant advancements in input configuration design, model adaptability, and the identification of optimal input dimensionality. Moreover, it validated common patterns observed in previous research and underscored the practical value of machine learning models for daily runoff forecasting and flood event management.

The core findings of this study are in alignment with, and extend, existing work in hydrology and machine learning. A systematic comparison of the four models revealed not only their respective strengths and limitations, but also differences in their sensitivity to input variables and compatibility with the basin’s hydrological characteristics. Among the models, TCN demonstrated superior performance, benefitting from its use of dilated and causal convolutions in effectively capturing abrupt runoff variations and managing redundant inputs—consistent with the findings of Xu et al. [45]. The GBRT model excelled in flood peak prediction, reflecting its robustness in handling outliers, in agreement with Wang et al. [40]. Notably, in mountainous river systems characterized by rapid runoff fluctuations (e.g., Zijiang Basin), GBRT achieved a 5% lower peak timing error than TCN, offering a more targeted solution for flood risk management. LSTM and CNN-LSTM yielded moderate performance levels, corroborating observations by Sadiki et al. [37], who noted LSTM’s susceptibility to performance degradation from redundant inputs. However, the dual-branch structure employed in CNN-LSTM proved valuable for capturing multi-scale temporal dynamics. In a single-branch network, the two datasets must be directly concatenated into a single sequence, which tends to cause an imbalance in feature weight distribution. By contrast, the dual-branch architecture first decouples the two datasets and extracts features individually before fusing them via concatenation. Such a structure can fully characterize the bidirectional temporal dependencies of hydrological series, avoid mutual interference between data at different scales, and effectively capture their interactions. This perspective is supported by the work of Chen et al. [43]. Compared to traditional hydrological models, the machine learning approaches in this study delivered 20~30% higher predictive accuracy than SWAT models applied in comparable basins [48], emphasizing the strength of data-driven models in representing complex, nonlinear hydrological processes.

Meteorological input combinations were found to exert a significant influence on model performance. Grounded in the synergy between core and auxiliary factors, the results supported the effectiveness of a “core-dominant + moderately auxiliary” input structure. This aligns with the “primary + auxiliary factor” framework proposed by Zhang et al. [26], but the present study extends that work by evaluating 13 gradient-based input scenarios. The results demonstrate that redundant variables are not inherently detrimental; their impact is contingent on their secondary interactions with core predictors. This insight addresses a common ambiguity in prior studies regarding variable combination effects. Zhu et al. [24] confirmed the “supply-consumption” interaction between precipitation and evaporation. Building on this, this study employed SHAP analysis to quantify their relative contributions in the Zijiang Basin and further identified temperature as a regulatory factor—together forming a “supply-consumption-regulation” mechanism. This contribution enhances our understanding of multi-factor hydrological interactions and is consistent with Mianabadi et al. [49] and Berghuijs et al. [50], who examined the balance between precipitation input and evaporative loss. While Liang et al. [25] highlighted the risk of noise introduced by weakly correlated features, our comparison of Scenarios 12 and 13 identified a threshold beyond which such variables degrade model performance—offering quantitative guidance for feature selection that aligns with the findings of Khozani et al. [51]. SHAP results identified precipitation as the most influential variable, echoing the findings of Li et al. [52], who reported that precipitation accounted for 68.2% of baseflow variation. The observed negative contribution of relative humidity also supports conclusions drawn by Kim et al. [53]. Additionally, this study delineated a dimensionality threshold for input variables, addressing a frequently overlooked source of uncertainty in model configuration.

Despite its contributions, the study has certain limitations that suggest directions for future work. The optimal input configuration—runoff, precipitation, evaporation, and temperature—proved effective under the subtropical monsoon climate of the Zijiang Basin, but its generalizability to arid regions or high-latitude basins with freeze–thaw processes remains to be assessed. Further studies are needed to evaluate model transferability across diverse climatic and hydrological contexts. Minimum data requirements vary by model. In this study, data availability satisfied the input thresholds for all models [54,55]. For data-scarce basins, the “core + auxiliary” input strategy can still improve accuracy under limited-data conditions, although performance for extreme event prediction may decline by 15~20% relative to data-rich scenarios. Moreover, the straightforward concatenation adopted for feature fusion in the dual-branch structure may inadequately capture the nonlinear interactions between hydrological and meteorological variables. The adoption of more sophisticated fusion mechanisms is expected to further enhance model performance. Future work should explore SHAP-based feature selection strategies, such as those proposed by Li et al. [29], to quantify model-specific sensitivity thresholds and support adaptive variable selection under varying hydrological conditions. Cross-basin validation, integration of regional hydrological characteristics into input design, and the application of data augmentation techniques in data-sparse regions are also recommended. Additionally, hybrid modeling frameworks that combine physically based hydrological models with data-driven approaches may strike a balance between interpretability and predictive accuracy, thereby enhancing the reliability and robustness of runoff simulation and forecasting.

6. Conclusions

This study develops four machine learning models for daily runoff prediction and designs 13 meteorological factor combination schemes to evaluate model performance. Through comprehensive simulations, the study investigates the adaptability of different models to various input combinations and identifies the optimal meteorological input dimensionality. The key conclusions are outlined as follows:

(1): Among all tested configurations, the TCN model demonstrates the best overall performance under the core input combination of “runoff + precipitation + evaporation + temperature” (Scenario 8), achieving an NSE of 0.96. It also shows strong resilience to redundant factor interference (Scenarios 9–12). The GBRT model performs well in terms of percentage error and peak flow timing prediction. In contrast, the LSTM and CNN-LSTM models are more sensitive to redundant inputs, with substantial increases in prediction errors under full-factor scenarios.
(2): Simulation accuracy improves progressively across core factor combinations (Scenarios 3–8), indicating that adding key variables enhances model performance without triggering efficiency loss. Scenario 8 provides the optimal input dimensionality across all models. Conversely, using full-factor inputs (Scenario 2) or combinations with weakly correlated variables (Scenario 13) introduces significant noise, confirming the hypothesis that “excessive input dimensionality reduces learning efficiency”.
(3): The predictive value of individual meteorological factors is governed by their degree of direct correlation with runoff processes. Variables that directly influence runoff generation or consumption yield far greater improvements in accuracy than those with indirect regulatory effects. Multi-factor combinations do not exhibit additive accuracy gains; rather, the presence of a complete coupling mechanism—encompassing supply, consumption, and regulation—is essential for optimal performance. Scenario 8 improves NSE by 7.9~10.3% compared to binary combinations (Scenarios 6 and 7) and mitigates large deviations in peak flow timing caused by single-factor limitations. The interference effects of redundant factors are closely tied to their secondary synergy with core variables. For instance, when relative humidity is paired with temperature, the resulting decrease in NSE is reduced from 16.9~20.7% to just 6.3%.
(4): SHAP analysis reveals that daily precipitation holds the highest average SHAP value (7.8189), identifying it as the most influential driver of runoff. Temperature (6.4520) and evaporation (−7.4823) operate as complementary, indirect regulators, while relative humidity (−5.3683) acts as a suppressive factor. The inclusion of relative humidity and other low-impact variables can exacerbate biases, particularly in peak flow timing predictions. For example, in Scenario 1, which uses only runoff input, the MAPE deviation reaches 59.91%. From an application perspective, a 3D input scheme of “precipitation + evaporation + temperature” is recommended.

Future research should focus on validating model adaptability across different river basins and temporal scales to further refine runoff simulation systems.

Author Contributions

Conceptualization, K.M.; methodology, K.M.; software, K.M. and C.J.; validation, Y.L. and Z.W.; formal analysis, K.M. and S.Y.; investigation, K.M.; resources, Y.L.; data curation, K.M., C.J. and S.Y.; writing—original draft preparation, K.M., C.J. and S.Y.; writing—review and editing, C.J., Y.L. and Z.W.; visualization, K.M.; supervision, C.J., Y.L. and Z.W.; project administration, C.J. and Z.W.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 52079010 and the Graduate Research and Innovation Project of Changsha University of Science & Technology, grant number CSLGCX223057.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

LSTM	Long Short-term Memory Neural Network
CNN	Convolutional Neural Network
TCN	Temporal Convolutional Network
GBRT	Gradient Boosting Regression Tree
SHAP	Shapley Additive Explanations
SWAT	Soil and Water Assessment Tool
HEC-RAS	Hydrologic Engineering Center-River Analysis System
SVR	Support Vector Regression
DEM	Digital Elevation Model
RNN	Recurrent Neural Network
MSELoss	Mean Squared Error Loss
NSE	Nash-Sutcliffe Efficiency Coefficient
NRMSE	Normalized Root Mean Square Error
MAPE	Mean Absolute Percentage Error
PPTS	Peak Percentage of Threshold Statistic

References

Abdi, E.; Sattari, M.T.; Samadianfard, S.; Ahmad, S. Advancing Hydrological Prediction with Hybrid Quantum Neural Networks: A Comparative Study for Mile Mughan Dam. Water 2025, 17, 3592. [Google Scholar] [CrossRef]
Lavers, D.A.; Harrigan, S.; Andersson, E.; Richardson, D.S.; Prudhomme, C.; Pappenberger, F. A Vision for Improving Global Flood Forecasting. Environ. Res. Lett. 2019, 14, 121002. [Google Scholar] [CrossRef]
Sheffield, J.; Wood, E.F.; Pan, M.; Beck, H.; Coccia, G.; Serrat-Capdevila, A.; Verbist, K. Satellite remote sensing for water resources management: Potential for supporting sustainable development in data-poor regions. Water Resour. Res. 2018, 54, 9724–9758. [Google Scholar] [CrossRef]
De, S.; Farzad, R.; Brewick, P.T.; Johnson, E.A.; Wojtkiewicz, S.F. Likelihood level adapted estimation of marginal likelihood for Bayesian model selection. Comput. Methods Appl. Mech. Eng. 2025, 445, 118141. [Google Scholar] [CrossRef]
Brewick, P.T.; Farzad, R. Hierarchical Bayesian calibration of Bouc–Wen hysteretic models with applications to seismic isolators. Mech. Syst. Signal Process. 2025, 237, 113021. [Google Scholar] [CrossRef]
Jeyrani, F.; Morid, S.; Srinivasan, R. Assessing basin blue–green available water components under different management and climate scenarios using SWAT. Agric. Water Manag. 2021, 256, 107074. [Google Scholar] [CrossRef]
Luo, M.; Liu, T.; Meng, F.; Duan, Y.; Huang, Y.; Frankl, A.; De Maeyer, P. Proportional coefficient method applied to TRMM rainfall data: Case study of hydrological simulations of the Hotan River Basin (China). Water Clim. Change 2017, 8, 627–640. [Google Scholar] [CrossRef]
Akiyanova, F.; Ongdas, N.; Zinabdin, N.; Karakulov, Y.; Nazhbiyev, A.; Mussagaliyeva, Z.; Atalikhova, A. Operation of Gate-Controlled Irrigation System Using HEC-RAS 2D for Spring Flood Hazard Reduction. Computation 2023, 11, 27. [Google Scholar] [CrossRef]
Park, N.; Kim, S.; Seo, I.; Yoon, S. Application of LPCF model based on ARIMA model to prediction of water quality change in water supply system. Desalin. Water Treat. 2021, 212, 8–16. [Google Scholar] [CrossRef]
Cui, L.; Wang, Y.; Zhang, H.; Lv, X.; Lei, K. Use of non-linear multiple regression models for setting water quality criteria for copper: Consider the effects of salinity and dissolved organic carbon. J. Hazard. Mater. 2023, 450, 131107. [Google Scholar] [CrossRef]
Avila, R.; Horn, B.; Moriarty, E.; Hodson, R.; Moltchanova, E. Evaluating statistical model performance in water quality prediction. J. Environ. Manag. 2018, 206, 910–919. [Google Scholar] [CrossRef]
Fernandes, A.P.; Fonseca, A.R.; Pacheco, F.A.L.; Fernandes, L.S. Water quality predictions through linear regression-A brute force algorithm approach. Methodsx 2023, 10, 102153. [Google Scholar] [CrossRef]
Osmane, A.; Zidan, K.; Benaddi, R.; Sbahi, S.; Ouazzani, N.; Belmouden, M.; Mandi, L. Assessment of the effectiveness of a full-scale trickling filter for the treatment of municipal sewage in an arid environment: Multiple linear regression model prediction of fecal coliform removal. J. Water Process Eng. 2024, 64, 105684. [Google Scholar] [CrossRef]
Ewnetu, S.S.; Dessie, M.; Belete, M.A.; van Griensven, A.; Walraevens, K.; Frankl, A.; Adgo, E.; Verhoest, N.E.C. Spatial and Temporal Evaluation of Gridded Precipitation Products over the Mountainous Lake Tana Basin, Ethiopia. Water 2025, 17, 3536. [Google Scholar] [CrossRef]
Ma, Q.; Ma, T.; Lu, C.; Cheng, B.; Xie, S.; Gong, L.; Fu, Z.; Liu, C. A Cloud-based Quadruped Service Robot with Multi-Scene Adaptability and various forms of Human-Robot Interaction. IFAC Papers OnLine 2020, 53, 134–139. [Google Scholar] [CrossRef]
Su, L.; Miao, C.; Duan, Q.; Lei, X.; Li, H. Multiple-wavelet coherence of world’s large rivers with meteorological factors and ocean signals. J. Geophys. Res. Atmos. 2019, 124, 4932–4954. [Google Scholar] [CrossRef]
Yuan, Y.; Zhou, C.; Wu, J.; Deng, F.; Liu, W.; Sun, M.; Li, L. An Interpretable Deep Learning Framework for River Water Quality Prediction—A Case Study of the Poyang Lake Basin. Water 2025, 17, 2496. [Google Scholar] [CrossRef]
Chen, Y.; Song, L.; Liu, Y.; Yang, L.; Li, D. A Review of the Artificial Neural Network Models for Water Quality Prediction. Appl. Sci. 2020, 10, 5776. [Google Scholar] [CrossRef]
Zhi, W.; Appling, A.P.; Golden, H.E.; Podgorski, J.; Li, L. Deep learning for water quality. Nat. Water 2024, 2, 228–241. [Google Scholar] [CrossRef]
Wellen, C.; Kamran-Disfani, A.-R.; Arhonditsis, G.B. Evaluation of the Current State of Distributed Watershed Nutrient Water Quality Modeling. Environ. Sci. Technol. 2015, 49, 3278–3290. [Google Scholar] [CrossRef]
Hawtree, D.; Mellander, P.-E.; Adams, R.; Ezzati, G.; Jackson-Blake, L.; Zurovec, O.; Norling, M.; Galloway, J. Application of a Parsimonious Phosphorus Model (SimplyP) to Two Hydrologically Contrasting Agricultural Catchments. Water 2026, 18, 6. [Google Scholar] [CrossRef]
Chen, Y.C.; Yu, S.R.; Yang, H.C.; Kuo, J.J.; Zeng, M.Y. Fast and Minimally Intrusive Method for Measuring Tidal-Stream Discharge. J. Hydrol. Eng. 2014, 20, 06014011. [Google Scholar] [CrossRef]
Huang, J.H.; Wang, Z.C.; Wu, J.H.; Yao, Z.Y. Research on Runoff Interval Prediction Based on Deep Learning Ensemble Optimization Model. J. Hydraul. Eng. 2025, 56, 240–252, 265. [Google Scholar]
Zhu, C.M.; Wu, H.J.; Song, X.Y.; Song, S.B. Application of SVR Model Based on Multi-Factor Combination in Runoff Forecasting of the Songhua River Basin. Water Resour. Power 2021, 39, 12–15 + 41. [Google Scholar]
Liang, H.; Lin, Y.; Yang, G.; Su, Z.; Wang, W.; Guo, F. Application of Random Forest Algorithm Based on Meteorological Factors in Forest Fire Prediction in the Tahe Area. Sci. Silvae Sin. 2016, 52, 89–98. [Google Scholar]
Zhang, J.C.; Zhao, Q.; Xu, X.J. Combined Factor Method in Statistical Forecasting. Chin. J. Atmos. Sci. 1978, 2, 48–54. [Google Scholar]
Wang, X.Y.; Chen, Q.; Du, H.L.; Zhang, R.; Ma, H.L. Study on Evapotranspiration Interpolation in Alpine Wetlands of the Qinghai-Tibet Plateau Based on Machine Learning. Chin. J. Plant Ecol. 2023, 47, 912–921. [Google Scholar] [CrossRef]
Xie, J.; Hsu, P.C.; Hu, Y.; Zhang, H.; Ye, M. Advancing subseasonal surface air temperature and heat wave prediction skill in China by incorporating scale interaction in a deep learning model. Geophys. Res. Lett. 2024, 51, e2024GL111076. [Google Scholar] [CrossRef]
Li, R.; Feng, K.; An, T.; Cheng, P.; Wei, L.; Zhao, Z.; Xu, X.; Zhu, L. Enhanced Insights into Effluent Prediction in Wastewater Treatment Plants: Comprehensive Deep Learning Model Explanation Based on SHAP. ACS ES&T Water 2024, 4, 1904–1915. [Google Scholar] [CrossRef]
Fan, Z.X.; Wang, Y.; Wang, R.T. Precipitation Forecasting Based on Interpretability of Neural Network Models. J. Trop. Meteorol. 2024, 40, 1030–1044. [Google Scholar]
Ma, C.Z.; Yao, J.Q.; Mo, Y.X.; Zhou, G.X.; Xu, Y.; He, X.M. Prediction of summer precipitation via machine learning with key climate variables: A case study in Xinjiang, China. J. Hydrol. Reg. Stud. 2024, 56, 101964. [Google Scholar] [CrossRef]
Li, B.; Zhang, X.P.; Yang, L.; Xia, Y. Spatiotemporal Variation Characteristics and Recurrence Period Calculation of Extreme Precipitation in the Zishui River Basin, Hunan Province. J. Irrig. Drain. 2019, 38, 117–128. [Google Scholar]
Long, Y.N.; Zhang, Y.L.; Jiang, C.B.; Mo, J.C.; Huang, C.F.; Song, X.Y. Runoff Response of the Zishui River Basin Under Climate Change Based on CMIP6. Res. Soil Water Conserv. 2024, 31, 114–125. [Google Scholar]
Muñoz-Sabater, J.; Dutra, E.; AgustígPanareda, A.; Albergel, C.; Arduini, G.; Balsamo, G.; Boussetta, S.; Choulga, M.; Harrigan, S.; Hersbach, H.; et al. ERA5-Land: A State-of-the-Art Global Reanalysis Dataset for Land Applications. Earth Syst. Sci. Data 2021, 13, 4349–4383. [Google Scholar] [CrossRef]
Yang, G.; Shi, H.J.; Jiang, Y.M.; Wu, Y.F.; Wang, Y.; Li, J. Spatiotemporal Variation Characteristics and Influencing Factors of Drought on the Loess Plateau Based on Daily-Scale SPEI. Res. Soil Water Conserv. 2025, 32, 244–254. [Google Scholar]
Li, Z.; Peng, S.; Zheng, G.; Chu, X.; Tian, Y. Prediction of Daily Water Consumption in Residential Areas Based on Meteorologic Conditions—Applying Gradient Boosting Regression Tree Algorithm. Water 2023, 15, 3455. [Google Scholar] [CrossRef]
Sadiki, N.; Jang, D.-W. Estimation of Hydraulic and Water Quality Parameters Using Long Short-Term Memory in Water Distribution Systems. Water 2024, 16, 3028. [Google Scholar] [CrossRef]
Agarwal, H.; Mahajan, G.; Shrotriya, A.; Shekhawat, D. Predictive data analysis: Leveraging RNN and LSTM techniques for time series dataset. Procedia Comput. Sci. 2024, 235, 979–989. [Google Scholar] [CrossRef]
Swiderski, B.; Osowski, S.; Gwardys, G.; Kurek, J.; Slowinska, M.; Lugowska, I. Random CNN structure: Tool to increase generalization ability in deep learning. Eurasip. J. Image Video Process. 2022, 2022, 3. [Google Scholar] [CrossRef]
Wang, G.Q.; Ruan, Y.L.; Wang, H.X.; Zhao, G.; Cao, X.X.; Li, X.M.; Ding, Q.J. Tribological performance study and prediction of copper coated by MoS₂ based on GBRT method. Tribol. Int. 2023, 179, 108149. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, W.; Wushour, S. Traffic Accident Prediction Based on LSTM-GBRT Model. J. Control Sci. Eng. 2020, 2020, 4206919. [Google Scholar] [CrossRef]
Xiang, X.; Guo, S.L.; Li, C.L.; Wang, Y. An explainable deep learning model based on hydrological principles for flood simulation and forecasting. Hydrol. Earth Syst. Sci. 2025, 29, 7217–7239. [Google Scholar] [CrossRef]
Chen, C.; Li, B.; Zhang, H.; Zhao, M.; Liang, Z.; Li, K.; An, X. Performance enhancement of deep learning model with attention mechanism and FCN model in flood forecasting. J. Hydrol. 2025, 658, 133221. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
Xu, Y.; Hu, C.; Wu, Q.; Li, Z.; Jian, S.; Chen, Y. Application of temporal convolutional network for flood forecasting. Hydrol. Res. 2021, 52, 1455–1468. [Google Scholar] [CrossRef]
Soleymani Hasani, S.; Arias, M.E.; Nguyen, H.Q.; Tarabih, O.M.; Welch, Z.; Zhang, Q. Leveraging explainable machine learning for enhanced management of lake water quality. J. Environ. Manag. 2024, 370, 122890. [Google Scholar] [CrossRef]
Liu, Y.C.; Liu, Z.H.; Luo, X.; Zhao, H.H.T. Diagnosis of Parkinson’s disease based on SHAP value feature selection. Biocybern. Biomed. Eng. 2022, 42, 856–869. [Google Scholar] [CrossRef]
Si, W.A.; Huang, Y.; Liu, T.; Li, Z.X.; Zan, C.J.; Wang, X.F. Runoff Simulation in the Source Area of the Yarkant River Based on Deep Learning and Air Temperature Spatial Field. Prog. Geogr. 2025, 44, 631–641. [Google Scholar]
Mianabadi, A.; Coenders-Gerrits, M.; Shirazi, P.; Bijan Ghahraman, B.; Alizadeh, A. A global Budyko model to partition evaporation into interception and transpiration. Hydrol. Earth Syst. Sci. 2019, 23, 4983–5000. [Google Scholar] [CrossRef]
Berghuijs, W.R.; Larsen, J.R.; van Emmerik, T.H.M.; Woods, R.A. A Global Assessment of Runoff Sensitivity to Changes in Precipitation, Potential Evaporation, and Other Factors. Water Resour. Res. 2017, 53, 8475–8486. [Google Scholar] [CrossRef]
Khozani, Z.S.; Precht, E.; Ionita, M. Weekly streamflow forecasting of Rhine river based on machine learning approaches. Nat. Hazards 2025, 121, 4135–4153. [Google Scholar] [CrossRef]
Li, J.; Sheng, F.; Liu, S.Y.; Zhang, T.; Yu, M.Q. Characteristics of baseflow variation and its response to precipitation in the Jiuqushui watershed of southern Jiangxi, subtropical China. Chin. J. Appl. Ecol. 2022, 33, 2251–2259. [Google Scholar]
Kim, Y.; Garcia, M.; Morillas, L.; Weber, U.; Black, T.A.; Johnson, M.S. Relative humidity gradients as a key constraint on terrestrial water and energy fluxes. Hydrol. Earth Syst. Sci. 2021, 25, 5175–5196. [Google Scholar] [CrossRef]
Fathi, M.M.; Al Mehedi, M.A.; Smith, V.; Fernandes, A.M.; Hren, M.T.; Terry, D.O., Jr. Evaluation of LSTM vs. conceptual models for hourly rainfall runoff simulations with varied training period lengths. Sci. Rep. 2025, 15, 15820. [Google Scholar] [CrossRef] [PubMed]
Zhang, M.F.; Ding, B.B.; Jia, G.D.; Yu, X.X. Comparative Runoff Prediction for the Beiluo River Based on TCN-BiLSTM and LSTM Models. J. Beijing For. Univ. 2024, 46, 141–148. [Google Scholar]

Figure 1. (a) Zijiang River Basin Station distribution map; (b) Zijiang River Basin DEM map.

Figure 2. Framework of Machine Learning Models for Daily Runoff Prediction.

Figure 3. Structure of LSTM model.

Figure 4. Structure of CNN-LSTM model.

Figure 5. Structure of TCN model.

Figure 6. (a) Radar chart of NSE assessment metrics for prediction accuracy across different scenarios for each model at Taojiang Station; (b) Radar chart of NRMSE assessment metrics for prediction accuracy across different scenarios for each model at Taojiang Station; (c) Radar chart of MAPE assessment metrics for prediction accuracy across different scenarios for each model at Taojiang Station; (d) Radar chart of PPTS assessment metrics for prediction accuracy across different scenarios for each model at Taojiang Station.

Figure 7. (a) Radar chart of NSE assessment metrics for prediction accuracy across different scenarios for each model at Shaoyang Station; (b) Radar chart of NRMSE assessment metrics for prediction accuracy across different scenarios for each model at Shaoyang Station; (c) Radar chart of MAPE assessment metrics for prediction accuracy across different scenarios for each model at Shaoyang Station; (d) Radar chart of PPTS assessment metrics for prediction accuracy across different scenarios for each model at Shaoyang Station.

Figure 8. Comparison of observed and predicted runoff under different input scenarios at Taojiang Station.

Figure 9. Comparison of observed and predicted runoff under different input scenarios at Shaoyang Station.

Figure 10. SHAP Summary plot of meteorological feature importance.

Table 1. Meteorological input feature combination schemes.

Scenario No.	Input Feature Combination	Basis for Construction
Scenario 1	Single runoff	Baseline control group
Scenario 2	Runoff + All meteorological factors	Baseline control group
Scenario 3	Runoff + Precipitation	Core factor dominance validation
Scenario 4	Runoff + Evaporation
Scenario 5	Runoff + Temperature
Scenario 6	Runoff + Precipitation + Evaporation
Scenario 7	Runoff + Precipitation + Temperature
Scenario 8	Runoff + Precipitation + Evaporation + Temperature	Multi-factor synergy validation
Scenario 9	Runoff + Precipitation + Evaporation + Relative humidity
Scenario 10	Runoff + Precipitation + Temperature + Relative humidity
Scenario 11	Runoff + Precipitation + Evaporation + Temperature + Relative humidity
Scenario 12	Runoff + Precipitation + Evaporation + Temperature + Relative humidity + Air pressure	Weak factor and core factor interaction validation
Scenario 13	Runoff + Precipitation + Air pressure + Surface ground temperature + Sunlight hours + Wind speed and direction	Weak factor and core factor interaction validation

Table 2. Performance metrics of models under different meteorological input combinations during test period at Taojiang Station.

Model	Evaluation Indicators	Scenario 1	Scenario 2	Scenario 3	Scenario 4	Scenario 5	Scenario 6	Scenario 7	Scenario 8	Scenario 9	Scenario 10	Scenario 11	Scenario 12	Scenario 13
LSTM	NSE	0.46	0.54	0.83	0.81	0.78	0.88	0.86	0.91	0.70	0.67	0.89	0.63	0.55
	NRMSE	0.0585	0.0543	0.0348	0.0352	0.0383	0.0273	0.0305	0.0235	0.0435	0.0458	0.0248	0.0483	0.0535
	MAPE	57.49	47.10	32.40	35.28	39.67	30.92	31.25	21.69	41.39	45.79	25.32	47.52	53.39
	PPTS	85.74	79.66	50.90	53.08	58.42	37.98	41.68	31.49	62.88	65.36	39.32	83.41	84.65
CNN-LSTM	NSE	0.50	0.58	0.83	0.80	0.79	0.86	0.85	0.95	0.73	0.69	0.90	0.67	0.54
	NRMSE	0.0565	0.0516	0.0336	0.0361	0.0378	0.0297	0.0328	0.0184	0.0415	0.0449	0.0236	0.0461	0.0539
	MAPE	56.49	40.94	33.95	40.09	44.32	31.91	32.51	23.86	49.33	46.15	26.11	43.61	46.07
	PPTS	75.46	77.29	49.30	58.62	60.11	45.15	47.91	28.48	69.14	69.93	36.45	71.33	79.77
GBRT	NSE	0.42	0.63	0.80	0.78	0.76	0.84	0.79	0.91	0.72	0.70	0.88	0.66	0.60
	NRMSE	0.0606	0.0483	0.0363	0.0373	0.0393	0.0315	0.0338	0.0240	0.0419	0.0451	0.0283	0.0467	0.0508
	MAPE	61.71	46.56	30.88	31.78	33.11	28.42	29.70	21.60	35.06	39.32	24.23	41.38	48.53
	PPTS	85.79	79.78	39.21	44.50	48.45	35.13	37.58	28.74	55.09	60.11	32.11	69.49	81.57
TCN	NSE	0.46	0.59	0.82	0.78	0.78	0.89	0.87	0.96	0.74	0.69	0.90	0.59	0.53
	NRMSE	0.0588	0.0513	0.0343	0.0378	0.0386	0.0269	0.0302	0.0167	0.0409	0.0435	0.0243	0.0510	0.0547
	MAPE	59.91	57.31	31.60	33.54	37.23	31.32	31.65	23.27	44.53	47.32	26.96	48.55	56.80
	PPTS	78.46	78.66	41.58	43.80	47.32	37.52	39.63	27.90	58.80	64.75	30.85	75.88	81.23

Table 3. Performance metrics of models under different meteorological input combinations during test period at Shaoyang Station.

Model	Evaluation Indicators	Scenario 1	Scenario 2	Scenario 3	Scenario 4	Scenario 5	Scenario 6	Scenario 7	Scenario 8	Scenario 9	Scenario 10	Scenario 11	Scenario 12	Scenario 13
LSTM	NSE	0.45	0.52	0.83	0.79	0.76	0.84	0.84	0.90	0.68	0.65	0.87	0.61	0.52
	NRMSE	0.0599	0.0553	0.0352	0.0359	0.0387	0.0280	0.0319	0.0247	0.044	0.0467	0.0251	0.0484	0.0554
	MAPE	58.57	48.05	33.45	36.67	40.77	31.22	31.73	22.37	42.20	46.58	25.49	47.83	54.97
	PPTS	86.16	80.45	51.75	55.69	58.27	39.94	45.65	35.31	63.74	70.46	38.88	80.34	84.96
CNN-LSTM	NSE	0.48	0.55	0.81	0.78	0.75	0.85	0.84	0.94	0.70	0.66	0.89	0.66	0.50
	NRMSE	0.0574	0.0525	0.0341	0.0365	0.0384	0.0299	0.0341	0.0191	0.0419	0.0453	0.0239	0.0466	0.0541
	MAPE	59.07	41.33	35.71	40.92	47.15	32.54	32.98	24.53	47.84	45.83	26.78	42.39	50.26
	PPTS	81.34	78.57	52.39	56.38	60.56	45.68	48.60	30.57	70.89	72.63	38.27	76.52	80.08
GBRT	NSE	0.43	0.62	0.78	0.79	0.77	0.85	0.80	0.92	0.71	0.70	0.88	0.64	0.60
	NRMSE	0.0616	0.0507	0.0365	0.0378	0.0394	0.0313	0.0342	0.0241	0.0421	0.0459	0.0282	0.0475	0.0517
	MAPE	58.26	45.39	31.80	33.04	34.21	29.11	30.36	22.41	37.23	40.17	24.61	42.71	48.95
	PPTS	85.61	80.91	41.62	45.95	49.73	36.57	38.92	29.93	56.15	65.81	33.13	71.38	82.33
TCN	NSE	0.46	0.60	0.83	0.8	0.77	0.88	0.85	0.96	0.72	0.69	0.91	0.60	0.55
	NRMSE	0.0592	0.0514	0.0346	0.0382	0.0385	0.0261	0.0299	0.0169	0.0413	0.0436	0.0247	0.0508	0.0555
	MAPE	57.19	55.47	32.19	36.13	41.01	31.49	31.72	23.72	45.07	46.39	27.03	50.02	56.91
	PPTS	80.55	79.04	42.44	44.26	48.18	38.11	40.10	28.14	59.07	67.35	31.61	76.09	81.86

Note: MAPE and PPTS values are reported without percentage symbols.

Table 4. Average SHAP values of meteorological features.

Meteorological Features	SHAP Value	Meteorological Features	SHAP Value
Precipitation	7.8189	Surface ground temperature	2.6512
Evaporation	−7.4823	Air pressure	2.3833
Air temperature	6.4520	Sunlight hours	0.8711
Relative humidity	−5.3683	Air Velocity	0.1236

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ma, K.; Jiang, C.; Long, Y.; Wu, Z.; Yan, S. Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability. Water 2026, 18, 601. https://doi.org/10.3390/w18050601

AMA Style

Ma K, Jiang C, Long Y, Wu Z, Yan S. Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability. Water. 2026; 18(5):601. https://doi.org/10.3390/w18050601

Chicago/Turabian Style

Ma, Kaiwen, Changbo Jiang, Yuannan Long, Zhiyuan Wu, and Shixiong Yan. 2026. "Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability" Water 18, no. 5: 601. https://doi.org/10.3390/w18050601

APA Style

Ma, K., Jiang, C., Long, Y., Wu, Z., & Yan, S. (2026). Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability. Water, 18(5), 601. https://doi.org/10.3390/w18050601

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Runoff Prediction in Zijiang River Basin Using Machine Learning and SHAP-Based Interpretability

Abstract

1. Introduction

2. Study Area and Data Processing

2.1. Study Area

2.2. Data Processing

3. Research Methodology

3.1. Machine Learning Model Structure

3.1.1. LSTM

3.1.2. CNN-LSTM

3.1.3. GBRT

3.1.4. TCN

3.2. SHAP Model Interpretation

3.3. Model Accuracy Evaluation Metrics

4. Results and Analysis

4.1. Comparative Analysis of Model Accuracy

4.2. Runoff Prediction Performance and Driving Mechanism Analysis Under Different Meteorological Factor Combinations

4.3. SHAP Model Interpretability Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI