Next Article in Journal
The Phenomenon of Greenwashing in the Automotive Industry and Its Perception Among Market Users
Previous Article in Journal
Enhancing Biogas Yield from Tanned Shavings: A Preliminary Study on Pretreatment Strategies
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Runoff Forecasting Framework Integrating Hydrological Physics and Data-Driven Models

1
China Renewable Energy Engineering Institute, Beijing 100120, China
2
Ecosystem Study Commission for International Rivers, Beijing 100120, China
3
Qiantang River Basin Center of Zhejiang Province, Hangzhou 310016, China
4
College of Hydrology and Water Resources, Hohai University, Nanjing 210098, China
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(24), 11120; https://doi.org/10.3390/su172411120
Submission received: 3 November 2025 / Revised: 3 December 2025 / Accepted: 9 December 2025 / Published: 11 December 2025

Abstract

Runoff forecasting is essential for flood control, disaster mitigation, and sustainable water resources management. However, runoff processes are highly nonlinear and uncertain due to multiple interacting meteorological and underlying surface factors. Current models can be divided into process-driven and data-driven types. The former offers clear physical interpretability but involves complex calibration and simplifications, while the latter captures nonlinear relationships effectively but lacks physical consistency. To integrate their strengths, this study constructs process-based models and data-driven models, and proposes two hybrid strategies: (1) incorporating intermediate variables from physical models, such as soil moisture and runoff yield, as additional features for data-driven models, and (2) embedding physics-based constraints and synthetic data into loss functions. Using the Songxi River Basin as a case study, results show that both hybrid strategies significantly outperform standalone models. SHapley Additive exPlanations (SHAP)-based interpretability analysis further reveals the contribution mechanisms of key physical variables. This study demonstrates that coupling physical processes with data-driven learning effectively enhances runoff forecasting accuracy and offers a promising paradigm to support sustainable watershed management, climate-resilient water regulation, and flood risk reduction.

1. Introduction

Runoff, as a key component of the hydrological cycle, provides a crucial scientific basis for developing efficient flood control strategies, optimizing water resource allocation, and achieving long-term sustainable management [1,2]. The generation and evolution of runoff is an extremely complex process influenced by meteorological factors such as precipitation and evaporation, as well as underlying surface characteristics, including topography, soil, land use, and human activities. These combined effects lead to pronounced spatiotemporal variability and high uncertainty in the runoff series [3,4]. Hydrological models, which serve as the core tools for simulating the water cycle and forecasting runoff, have primarily evolved along two distinct technical pathways: process-driven (physics-based) and data-driven approaches [5].
Process-driven models, also known as physically based models, aim to describe and conceptualize the core physical mechanisms of the hydrological cycle—such as evapotranspiration, runoff generation, and flow routing—through systems of mathematical equations [6]. These models can generally be divided into two categories. The first comprises lumped or semi-distributed conceptual hydrological models [7], such as the Stanford, Xinanjiang (XAJ) [8], and TANK models, which feature relatively simple structures and parameters with clear physical meanings. The second category includes distributed hydrological models, such as the Soil and Water Assessment Tool (SWAT) [9], the Variable Infiltration Capacity (VIC) model, and MIKE SHE, which can explicitly represent the spatial heterogeneity of a basin [10]. However, any physical model is, by nature, a simplification and approximation of the complex real world, and thus cannot fully capture the true mechanisms and dynamic behavior of hydrological systems. In addition, these models often involve numerous parameters, require complicated calibration procedures, and incur high computational costs. Consequently, their forecasting accuracy may encounter limitations when applied to highly nonlinear or strongly uncertain watershed systems.
With the advancement of machine learning theory, data-driven models grounded in statistical principles have gained widespread application in the hydrological domain [11,12,13]. From early autoregressive models (AR) and support vector machines (SVM) [14,15] to artificial neural networks (ANN) [16,17], and more recently to deep learning architectures such as long short-term memory (LSTM) [18,19], gated recurrent unit (GRU) [20], and sequence-to-sequence (Seq2Seq) models, data-driven methods have demonstrated remarkable potential [21]. These models bypass the complex description of physical processes and instead perform simulation and prediction by learning the nonlinear mapping relationships between historical input variables (e.g., precipitation and evaporation) and output variables (e.g., runoff) [22]. Despite their strong performance in handling nonlinear problems, data-driven models also exhibit notable drawbacks. First, their predictive capability is highly dependent on the quantity and quality of training data, making them prone to overfitting or poor generalization in data-scarce regions or under extreme conditions [23]. Second, their “black-box” nature results in limited physical interpretability and a lack of transparency in the decision-making process, which constrains their credibility and applicability in real-world hydrological forecasting [24]. These limitations have motivated hydrologists to explore more comprehensive modeling frameworks that integrate physical processes, multi-source data, and advanced learning techniques.
In addition to 1D runoff forecasting, a growing body of hydrological research has emphasized integrating 2D hydrodynamic modeling, remote-sensing observations, and machine-learning ensembles to support more comprehensive flood-simulation frameworks. For example, previous studies have demonstrated that coupling 2D hydraulic models with remote-sensing-supported ensemble machine-learning methods can effectively characterize spatial inundation patterns and improve flood-hazard assessment [25]. Sensitivity-analysis studies of 2D flood-inundation models further reveal the dominant physical parameters that govern inundation extent and model stability, offering valuable guidance for optimizing hydrological model inputs [26]. Meanwhile, multi-scale decomposition techniques, exemplified by wavelet-transform–artificial neural network (WT–ANN) hybrid models, have proven effective in separating high-frequency and low-frequency components of hydrological signals, thereby enhancing both short-term and long-term forecasting performance in complex river basins [27]. Synthesizing these advances helps conceptually bridge the gap between 1D runoff forecasting and broader flood-modeling research, and situates the present hybrid framework within the expanding landscape of physics–ML hydrological modeling.
Given the respective advantages and limitations of process-driven and data-driven models, integrating the physical consistency of the former with the nonlinear learning capability of the latter to achieve complementary strengths has become a major research focus in runoff forecasting [28]. This study aims to explore effective pathways for integrating physical-process models with data-driven approaches to improve the accuracy and reliability of runoff prediction. Taking the Songxi River Basin as a case study, we first systematically construct and evaluate several representative process-driven and data-driven models. On this basis, two distinct hybrid strategies are designed and validated: (1) incorporating intermediate outputs from physical models (e.g., soil moisture and runoff yield) as additional inputs to drive data-driven models, and (2) introducing physics-based constraints—such as water balance conditions and synthetic data for extreme events—into the training process of data-driven models. Finally, by employing an interpretability tool (SHAP) [29], we analyze the internal decision mechanisms of the hybrid models, offering a new perspective for enhancing the transparency of “black-box” models. Through these efforts, this study seeks to provide more robust scientific support for watershed water-resources management and flood disaster mitigation.

2. Study Area and Methods

2.1. Study Area

The study area is the catchment controlled by the Songxi Hydrological Station, located in Nanping City, Fujian Province, China. The basin covers a drainage area of approximately 1628 km2, and characterized by terrain that slopes from the northeast to the southwest. The region experiences a subtropical humid monsoon climate, with an average annual precipitation ranging from 1557 mm to 1743 mm. Rainfall shows distinct spatiotemporal variability, with the majority concentrated from March to July. This uneven distribution makes the area susceptible to both floods and droughts, underscoring the practical importance of high-precision runoff forecasting. The geographical location of the basin and its monitoring network are shown in Figure 1.
Data for this study were compiled from multiple sources. Daily runoff data from the Songxi Hydrological Station were used as the prediction target. Meteorological inputs included daily rainfall data from six rainfall stations within the basin (Qingyuan, Yaocun, Zhongcun, Hutong, Weitian, and Songxi) and the daily evaporation data from the Songxi station, primarily sourced from the Hydrological Yearbooks. Additional, a gridded daily meteorological dataset (CN05) with a spatial resolution of 0.25° was obtained from the Climate Change Research Center, Chinese Academy of Sciences (https://ccrc.iap.ac.cn/resource/, accessed on 2 November 2025), providing variables such as precipitation, air temperature, relative humidity, wind speed, and sunshine duration. Land use data were derived from the Resource and Environmental Science and Data Center (https://www.resdc.cn/), using the 2020 national land use distribution map of China. Soil data were obtained from the Harmonized World Soil Database (HWSD), available through the Food and Agriculture Organization (FAO) soil portal (https://www.fao.org/).

2.2. Process-Driven Runoff Forecasting Models

Two widely used process-driven models were selected to serve as physical benchmarks.

2.2.1. Xinanjiang Model

The Xinanjiang model [5] is a conceptual rainfall–runoff hydrological model. The model features a simple structure and parameters with clear physical meanings, making it widely applied in hydrological runoff forecasting studies. As a representative process-driven model, the Xinanjiang model consists of four computational modules: evapotranspiration, runoff generation, flow partitioning, and flow routing. Each module employs mathematical equations to describe the physical processes of the hydrological cycle.

2.2.2. SWAT Model

The Soil and Water Assessment Tool (SWAT) is a process-based hydrological model used to simulate watershed hydrological processes, soil erosion, and non-point source pollution [30,31]. The model divides a watershed into multiple sub-basins based on a threshold area for the minimum sub-basin size. Within each sub-basin, further subdivision is performed according to land use and soil types, forming Hydrological Response Units (HRUs) with relatively homogeneous underlying surface characteristics. Each HRU operates independently and serves as the fundamental computational unit of the model [32]. Hydrological processes are simulated for each HRU and then routed through the river network to the watershed outlet, allowing for a detailed representation of spatial heterogeneity.

2.3. Data-Driven Runoff Forecasting Models

According to the complexity of their structural design, data-driven models can generally be categorized into shallow machine learning models and deep learning models [33]. Shallow machine learning models typically employ relatively simple network architectures, such as logistic regression, SVM, and decision trees. These models directly perform linear or simple nonlinear transformations on input features for tasks such as classification, regression, or clustering. They are mainly used to process relatively simple datasets or those with low feature dimensionality, offering advantages such as high computational efficiency, ease of implementation, and interpretability.
In contrast, deep learning models refer to those employing multilayer neural-network architectures within the field of machine learning. Through successive nonlinear transformations across multiple hidden layers, these models are capable of learning hierarchical feature representations from data. Common deep learning models include convolutional neural networks (CNN), recurrent neural networks (RNN) [34], and autoencoder architectures. Owing to their powerful feature-learning and generalization capabilities, deep learning models have been widely applied to complex domains such as image recognition and natural language processing.

2.3.1. SVR Model

The Support Vector Regression (SVR) model is a regression algorithm based on the SVM framework. The SVM model was originally developed as a powerful learning algorithm for classification problems. By constructing a linear classification model that minimizes both the number of support vectors and the distance to the hyperplane, it effectively solves linearly separable problems [35]. When extended to regression analysis, the SVR model builds a separating hyperplane to capture nonlinear relationships, allowing as many samples as possible to lie on or near the fitted hyperplane. In this way, it maximizes the margin while minimizing the loss between the predicted and observed training data. The main computational principle of SVR can be expressed as follows:
The SVR model can be expressed as follows:
( Y i ) = w T ϕ ( X i ) + b
Here, Y i represents the predicted output of the model, X i denotes the input samples, ϕ is the nonlinear mapping function, w is the feature weight, and b is the bias vector.
By introducing a constant ε > 0 , the loss function of the SVR model can be defined as follows:
e r r ( X i , Y i ) = 0 , Y i w T ϕ ( X i ) + b ε Y i w T ϕ ( X i ) + b , Y i w T ϕ ( X i ) + b > ε
Thus, the regression problem is transformed into an optimization problem:
min ξ i , ξ i , w , b 1 2 w 2 + C i = 1 n ( ξ i + ξ i )
s . t . ε ξ i Y i w T ϕ ( X i ) + b ε + ξ i
where C is the penalty coefficient; ξ and ξ i are the slack variables, while the remaining symbols retain the same meanings as previously defined. Finally, by introducing the Lagrange multipliers λ i and λ i :
w = i = 1 n ( λ i + λ i ) X i
Y i = i = 1 n ( λ i λ i ) K e r n e l ( X i , X ) + b
In this equation, K e r n e l represents the nonlinear mapping function of the regression equation, and the remaining symbols have the same meanings as defined above.
SVR has been increasingly applied in flood-related studies owing to its strong capability to capture nonlinear rainfall–runoff relationships. In regional flood-frequency analysis, SVR has been used to estimate flood quantiles in basins with limited hydrological records, demonstrating reliable performance compared with traditional regression and neural-network approaches [36]. SVR has also been successfully applied in flash-flood forecasting in small mountainous catchments, where rapid hydrological response and short concentration times pose significant challenges. By using lagged rainfall and runoff as inputs, SVR is able to reproduce short-lead predictions and key hydrological response characteristics, making it suitable for operational applications such as early flood warning, peak-flow forecasting, and flood-risk mitigation [37].

2.3.2. XGBoost Model

The XGBoost model [38] is an improved gradient boosting ensemble learning algorithm. Similar to other boosting methods, it integrates multiple base learners, which are typically Classification and Regression Tree (CART) regression trees. XGBoost constructs regression trees iteratively in a sequential manner, where each new tree is generated by splitting features to fit the residuals of the previous tree. The final prediction result is obtained by summing the outputs of all regression trees. Assuming that a total of K regression trees are constructed, the predicted value of the i-th sample can be expressed as:
y i = k = 1 K f k ( x i ) , f k F
The objective function of XGBoost consists of two components: the training loss and the regularization term. The loss term represents the prediction error of the model, while the regularization term serves as a penalty factor designed to prevent overfitting:
O b j = i = 1 n l ( y i , y i ) + k = 1 K Ω ( f k )
Ω ( f k ) = γ P + λ 2 P = 1 P w P 2
In this equation, l represents the loss function that measures the error between the predicted and observed values; Ω denotes the regularization term; λ and γ are the regularization coefficients; w refers to the leaf weights of the decision tree; and P is the number of leaves in the tree.
Since XGBoost is an additive model, during the t-th iteration:
y i t = y i t 1 + f t ( x i )
O b j t = i = 1 n l ( y i , y i t 1 + f t ( x i ) ) + Ω ( f t ) + constant
By introducing the Taylor expansion and removing the constant term, the objective function at iteration t can be expressed as:
O b j t = i = 1 n [ g i f i ( x i ) + 1 2 h i f i 2 ( x i ) ] + γ P + λ w j 2 2
In this equation, g i = d l ( y i , y i t 1 ) d y i t 1 represents the first derivative of the loss function, and h i = d 2 l ( y i , y i t 1 ) ( d y i t 1 ) 2 denotes the second derivative of the loss function.
By taking the derivative of the objective function with respect to w and setting the derivative to zero, the optimal leaf weight can be obtained, and the simplified objective function is expressed as:
w j * = G p H p + λ
O b j t = 1 2 p = 1 P G p H p + λ + γ P
In this equation, G p represents the sum of the first derivatives g i of all samples in leaf node i p , and H p represents the sum of the second derivatives h i of all samples in leaf node i p .
XGBoost has been increasingly adopted in flood-related studies due to its strong capability in handling nonlinear feature interactions and high-dimensional geospatial variables. In multi-hazard environments, XGBoost has been used to generate flood-susceptibility maps by integrating topographic, geological, and remote-sensing predictors, providing valuable information for land use planning and risk mitigation in flood-prone regions [39]. Furthermore, XGBoost has been applied to data-driven flood alert systems to forecast water-level stages at short time intervals, enabling continuous, automatic detection of approaching flood events and supporting real-time early-warning operations [40]. These applications demonstrate that XGBoost is not only effective in predictive modeling but also offers practical value for flood-risk assessment and operational flood-warning systems.

2.3.3. GRU Model

RNNs possess unique recurrent structures and the capability to model contextual dependencies, which give them significant advantages in sequence classification and prediction tasks. However, RNNs also have inherent limitations, such as gradient vanishing and gradient explosion, which constrain their performance when processing long sequences. To overcome these issues, researchers have proposed several improved variants. The LSTM network, through its distinctive “three-gate” architecture—comprising the forget gate, input gate, and output gate—effectively addresses the gradient vanishing problem encountered by standard RNNs when handling long-term dependencies. This design enables LSTM to retain long-term information in sequences while selectively forgetting outdated information when new data arrive. Despite its remarkable success in various sequence modeling tasks, the complex structure of LSTM inevitably introduces additional computational costs.
The GRU, proposed by Cho et al. in 2014 [41], is an improved recurrent neural network model featuring a more compact design that reduces the number of parameters and enhances training efficiency. Due to its conceptual similarity to the LSTM architecture, GRU is often regarded as a variant of LSTM. However, unlike the “three-gate” structure of LSTM, GRU employs only two gates—the update gate and the reset gate—which enable the network to selectively update and forget input information. This mechanism allows GRU to better capture contextual dependencies when processing long sequences and, under certain conditions, to achieve superior performance. The internal structure of GRU is illustrated in Figure 2.
The main computational process of the GRU can be expressed as follows:
Z t = σ ( W z [ h t 1 , x t ] )
r t = σ ( W r [ h t 1 , x t ] )
g t = tanh ( W g [ r t h t 1 , x t ] )
h t = ( 1 Z t ) h t 1 + Z t g t
y t = σ ( W y h t )
In this equation, Z t represents the update gate state at time t; r t denotes the reset gate state at time t; g t refers to the candidate hidden state at time t; x t is the input at time t; h t 1 and h t are the memory states at times t − 1 and t, respectively; y t is the output at time t; W denotes the weight coefficients; and σ represents the activation function.
In recent years, GRU-based deep learning models have also been widely applied in flood-prone regions due to their ability to capture nonlinear rainfall–runoff relationships and long-term dependencies. For example, GRU has been used to improve daily streamflow simulations in flood-prone basins with low-convergence hydrological data, where tailored input-selection strategies and outlier-removal techniques were shown to significantly enhance model robustness and predictive stability [42]. Such studies highlight the strong capability of GRU models to recognize complex hydrological patterns and to support streamflow forecasting under challenging data conditions, providing practical value for flood-risk mitigation and real-time water-resources management.

2.3.4. Seq2seq-Attention Model

The GRU-Seq2seq-Attention model extends the GRU architecture by introducing an encoder–decoder structure and an attention mechanism. The Seq2seq model allows the network to handle input and output sequences of different lengths and alleviates information loss through the transmission of hidden states between the encoder and decoder, thereby enabling the model to maintain long-term memory for long-sequence prediction tasks.
The core principle of the Seq2seq structure is that the encoder transforms the input sequence into a fixed-length context vector C, and the decoder converts this context vector C into an output sequence. Each unit of the encoder extracts features from the input sequence and passes its hidden state to the next unit, progressively aggregating contextual information. The final context vector C represents the encoded hidden state of the entire input sequence, encapsulating all input features. The main equations of the encoder are as follows:
h t = G R U e n c ( h t 1 , x t )
In this equation, h t 1 and h t represent the hidden states of the GRU neurons in the encoder at time steps t − 1 and t, respectively; x t denotes the input data at time t; and G R U e n c refers to the GRU model in the encoder.
However, encoding the entire input sequence into a fixed-length context vector C may lead to cumulative information loss. In the Seq2seq model, since identical weight assignments are applied across the hidden layers during information transmission, it is difficult for the network to selectively extract salient features from the input sequence.
The introduction of the Attention mechanism effectively overcomes these limitations. With Attention, the decoder no longer compresses the entire input sequence into a single fixed-length context vector C; instead, it computes the relevance (or contribution) of each input element during decoding and assigns different attention weights accordingly. This enables the model to capture useful information from all hidden layers and improves prediction accuracy. The main equations of the decoder are as follows:
a t = s o f t max ( W a c o n c a t ( h s , h t ) )
C t = a t h t
y t = G R U d e c ( h t , C t , y t 1 )
In this equation, h s represents all hidden states of the encoder; h t denotes the hidden state of the decoder at time t; W a is the weight coefficient; a t represents the attention weight calculated by the Attention mechanism; C t is the context vector at time t; y t 1 and y t denote the outputs at time steps t − 1 and t, respectively; concat refers to the concatenation function; softmax is the activation function; and G R U d e c represents the GRU model in the decoder.
The structure of the Seq2seq model is shown in Figure 3.

2.3.5. SHAP Interpreter

The core concept of the SHAP interpreter is derived from the Shapley value in game theory, which was originally proposed to address the problem of fairly distributing total gains among participants in a cooperative game. Based on this “fair allocation” principle, the SHAP algorithm provides a systematic solution for quantifying the contribution of each participant. Lundberg and Lee extended this idea to machine learning by computing Shapley values for trained models, thereby evaluating the contribution of each input feature to the model’s output. This approach enhances the transparency of black-box models and allows for a more intuitive understanding of their internal mechanisms.
The Shapley value is calculated as the average of all possible marginal contributions of a given feature across all feature combinations. The computation formula for the SHAP value is as follows:
ϕ i = S N { i } S ! ( N S 1 ) ! n ! [ v ( S i ) v ( S ) ]
In this equation, ϕ i denotes the contribution value of feature i, N represents the set of all features; n is the total number of features; S refers to a subset that does not contain feature i; v(S) is the SHAP value of subset S; and v ( S i is the SHAP value of the new subset obtained by adding feature i.
The SHAP algorithm is regarded as an additive feature attribution method, in which the final model prediction is obtained by summing the SHAP values of all individual features. The specific formula is as follows:
g ( z ) = ϕ 0 + i = 1 n ϕ i z i
In this equation, z represents the vector of the input subset; ϕ 0 denotes the Shapley mean value of all training samples; and ϕ i represents the Shapley value of each input feature.
For different data-driven models, the SHAP interpreter proposes various estimation approaches to adapt to different model types and application scenarios. The Kernel Explainer is applicable to a wide range of models. It constructs a weighted linear regression model to approximate the predictions of the original model, where the contribution of each feature to the prediction is represented as a linear coefficient. The SHAP values for each feature are then obtained by solving the coefficients of this weighted linear regression model.
For tree-based models (such as decision trees, random forests, and XGBoost), the SHAP interpreter typically uses the Tree Explainer for estimation. This method leverages the structural properties of tree models and calculates each feature’s contribution by traversing the decision nodes within the trees.

2.4. Coupling Methods Between Process-Driven and Data-Driven Models

To further improve runoff prediction accuracy and enhance model interpretability, this study develops a hybrid modeling framework based on two complementary strategies: integrating outputs from physical hydrological models and incorporating physics-based constraints into data-driven learning. The overall workflow of the proposed framework is illustrated in Figure 4.

2.4.1. Data-Driven Runoff Forecasting Model Integrated with Physical Models

The core concept of this strategy is to enhance the input information of data-driven models by incorporating intermediate variables simulated from physical models that possess explicit physical meanings [43,44,45]. Specifically, the calibrated Xinanjiang and SWAT models are first executed to obtain time-series data of physical variables such as soil moisture, surface runoff, and basin outlet discharge [46]. These physically based variables are then used as additional input features, combined with the original meteorological data (precipitation and evaporation) and historical runoff data, to train and predict using the GRU and Seq2seq models.
Through this integration, the data-driven models can not only learn statistical patterns from observed data but also directly leverage prior hydrological knowledge provided by the physical models regarding the internal states of the watershed. The specific scheme configurations are summarized in Table 1 and Table 2.

2.4.2. Physically Constrained Data-Driven Runoff Forecasting Model

This strategy aims to embed hydrological physical laws directly into the learning process of data-driven models to constrain model behavior and ensure that the predicted results maintain physical consistency [47]. In this study, the incorporation of physical mechanisms is primarily implemented in two aspects. The overall framework of the proposed physically constrained data-driven runoff forecasting model is illustrated in Figure 5.
  • Water Balance and Synthetic Data:
To address the scarcity of extreme events (e.g., high-intensity rainfall and prolonged drought) in observed datasets, this study constructs synthetic rainfall and drought event sets based on the water balance equation ( Q = P E Δ S ). The synthetic rainfall events simulate precipitation processes with different magnitudes and temporal distributions, assuming fully saturated soil conditions in which rainfall is almost entirely converted into runoff. The rainfall–runoff constraint loss function is defined as follows. By minimizing the loss between the model-predicted runoff and the runoff from the synthetic rainfall dataset during training, the model can be guided to better capture peak flow responses.
M S E p e a k = 1 N i = 1 n Q s i m Q s y n
In this equation, M S E p e a k represents the rainfall constraint loss function; N denotes the number of samples in the dataset; Q s i m is the simulated runoff predicted by the model; and Q s y n is the runoff value from the synthetic dataset.
Similarly, a synthetic drought dataset is constructed, and the drought (no-rain) constraint loss function is defined as follows. By minimizing the loss between the model-predicted runoff and the runoff from the synthetic drought dataset during training, the model is guided to suppress excessive baseflow and improve its ability to capture low-flow conditions.
M S E b a s e = 1 N i = 1 n Q s i m Q s y n
In this equation, M S E b a s e represents the runoff constraint loss function, and the other parameters are the same as defined above.
To further improve the reproducibility of the physics-constrained training process, the construction of synthetic rainfall and drought datasets is described in detail as follows. The model schemes based on these physical mechanisms are summarized in Table 3.
  • Synthetic rainfall dataset
The top 2% peak-runoff events in the training period were first identified, and the corresponding observed rainfall–runoff segments were extracted and merged into a baseline flood-scenario dataset. These events represent near-saturated soil conditions under which atmospheric humidity is high and evapotranspiration becomes negligible. Under such conditions, the water-balance relationship can be simplified such that runoff is approximately equal to the total rainfall.
To expand the range of extreme rainfall inputs, the total event rainfall was set to 50–150 mm with 10 mm increments. Considering the short-duration and concentrated storm patterns characteristic of the study basin, three temporal distribution patterns were designed: (i) all rainfall occurring on Day 1; (ii) a 75%/25% distribution across Days 1–2; and (iii) a 50%/25%/25% distribution across Days 1–3. For each rainfall amount, synthetic samples corresponding to the three distribution patterns were selected in balanced proportions, with randomness only in choosing the baseline flood events from the historical dataset. This procedure yields a synthetic rainfall dataset that captures diverse flood intensities and temporal structures.
2.
Synthetic drought dataset
Drought scenarios were identified using a “no-rainfall period of at least 15 consecutive days” criterion, during which runoff remains persistently low. All historical segments in the training period satisfying this definition were collected and merged into a baseline drought dataset. In generating synthetic drought samples, rainfall was fixed to zero, while evapotranspiration (2–4 mm/day) and low-flow runoff values (2–10 m3/s) were randomly drawn from the observed dry-season records. This method produces hydrologically realistic synthetic drought scenarios representing prolonged deficit conditions.
  • Physical Constraint Loss Function:
During model training, the traditional loss function was modified. The new composite loss function consists of three components: the data-fitting loss from observed samples ( M S E d a t a ), the synthetic rainfall event constraint loss ( M S E p e a k ), and the synthetic drought event constraint loss ( M S E b a s e ). By minimizing this combined loss function, the model is guided not only to fit conventional data but also to conform more closely to hydrological physical laws when predicting peak and base flows. This approach effectively prevents unrealistic phenomena such as negative runoff or excessively large baseflow. The loss function is formulated as follows:
l o s s = λ d a t a M S E d a t a + λ p e a k M S E p e a k + λ b a s e M S E b a s e
In this equation, loss represents the total loss function of the model; M S E d a t a denotes the loss function between the observed and simulated values of the model; and λ d a t a , λ p e a k and λ b a s e are the weighting coefficients corresponding to different loss components. To determine an appropriate configuration of the weighting coefficients, a sensitivity analysis was conducted. Specifically, λ d a t a was varied from 0.95 to 0.65 (step = 0.05), while λ p e a k and λ b a s e were assigned equal values and jointly adjusted from 0.025 to 0.175 (step = 0.025), subject to the constraint λ d a t a + λ p e a k + λ b a s e = 1 . The results show that model performance, measured by the Nash–Sutcliffe Efficiency (NSE) gradually improves as λ d a t a decreases from 0.95 to 0.80, rising from 0.878 to 0.895. However, when λ d a t a is further reduced to 0.65, the NSE declines back to 0.878, indicating that excessively weak or excessively strong physical constraints both degrade model performance. Based on these findings, the final configuration was chosen as λ d a t a = 0.80 and λ p e a k = λ b a s e = 0.10 .

2.5. Evaluation Metrics

To comprehensively assess model prediction accuracy, three widely used hydrological evaluation metrics were selected: the NSE, mean absolute error (MAE), and root mean square error (RMSE). These metrics quantify different aspects of model performance and collectively provide a robust evaluation framework.
NSE measures the agreement between simulated and observed runoff, ranging from −∞ to 1. Values closer to 1 indicate a higher level of model reliability. Its calculation formula is:
N S E = 1 ( y i y i ) 2 ( y i y ¯ ) 2
MAE reflects the average absolute error between the predicted and observed values. Its range is 0 to +∞, and values closer to 0 indicate smaller model errors. Its calculation formula is:
M A E = i = 1 n y i y i n
RMSE reflects the degree of deviation between the predicted and observed values. It also ranges from 0 to +∞, and values closer to 0 indicate smaller prediction errors. Its calculation formula is:
R M S E = i = 1 n y i y i 2 n
In the above formulas, y i denotes the observed runoff at time t; y ¯ denotes the mean observed runoff; y i denotes the predicted runoff at time t; n denotes the total number of observations.

3. Results and Discussion

3.1. Runoff Simulation Results of Process-Driven Models

To maintain consistency with the calibration and validation periods used in the subsequent machine learning models, both physically based hydrological models adopted the period from 2005 to 2018 for calibration and from 2019 to 2021 for validation. The Xinanjiang model was calibrated using data from 2005 to 2018, after which the calibrated parameters were fixed to simulate runoff for the validation period of 2019–2021.
The results indicate that both models performed well in the study area, as summarized in Table 4. The NSE values for the calibration period reached 0.829 and 0.806 for the Xinanjiang and SWAT models, respectively, while those for the validation period were 0.840 and 0.825. Comparative analysis shows that the Xinanjiang model achieved slightly higher simulation accuracy than the SWAT model, with better performance in all evaluation metrics during both the calibration and validation periods. The detailed daily runoff simulation results for both models during the validation period are shown in Figure 6 and Figure 7.
Although the SWAT model theoretically provides a more detailed representation of hydrological processes across heterogeneous landscapes by accounting for spatial variability, its actual performance in this study area was slightly inferior to that of the Xinanjiang model. This may be attributed to the combined influence of regional factors such as topography, land use, and meteorological conditions, under which the simpler structure and fewer parameters of the Xinanjiang model allowed it to better adapt to the hydrological characteristics of the basin. Therefore, considering all aspects, the Xinanjiang model is deemed more suitable for daily runoff simulation in this study area.
Figure 7. Daily Runoff Simulation Results of the Xinanjiang Model during the Validation Period.
Figure 7. Daily Runoff Simulation Results of the Xinanjiang Model during the Validation Period.
Sustainability 17 11120 g007

3.2. Runoff Simulation Results of Data-Driven Models

In the process-driven modeling stage, two types of meteorological data—observed and gridded—were used for runoff simulation. To ensure comparability, the same two datasets were adopted as input for the data-driven models in this section, with historical runoff data additionally included to improve simulation and prediction accuracy. The effects of different input data types and model structures on runoff prediction performance were analyzed by comparing evaluation metrics across experiments.
Four data-driven models were employed in this study, and five different input data combinations were designed:
  • Using only historical runoff for prediction;
  • Using rainfall and evaporation data;
  • Using rainfall, evaporation, and historical runoff;
  • Using six meteorological variables for prediction; and
  • Using six meteorological variables together with historical runoff.
Each of the five input combinations was applied to the four data-driven models, resulting in a total of 20 model configurations. The objective was to compare and analyze the impact of different input datasets and model architectures on prediction accuracy. The specific scheme configurations are summarized in Table 5.
A comparison of the best-performing schemes among different models is presented in Table 6. As shown in the table, the XGB-3 and GRU-3 models exhibit comparable prediction accuracy, with NSE values of 0.844 and 0.846, respectively, during the validation period. The Seq2seq-3 model achieved the highest accuracy, with a validation NSE of 0.859, indicating that under identical input data conditions, the Seq2seq model performs best in this study area.
A comparison of the effects of different input schemes on runoff prediction accuracy is illustrated in Figure 8. As shown in the figure, during the validation period, all models achieved their highest NSE values when using the input data from Scheme 3, followed by Scheme 2. In contrast, when using Schemes 4 and 5, the NSE values during the validation period generally ranged between 0.5 and 0.6, indicating relatively poor predictive performance.
Scheme 2 was based on observed rainfall and evaporation data from rain gauge and hydrological stations, while Scheme 3 further incorporated observed runoff data, leading to noticeable improvements in the results across all four data-driven models compared with Scheme 2. Scheme 4 utilized gridded meteorological data—including rainfall, relative humidity, and temperature—and Scheme 5 additionally included observed runoff data. However, the predictive performance of Schemes 4 and 5 did not differ significantly from that of Scheme 1, which used only observed runoff as input.
These results suggest that the relatively small size of the study basin and the coarse spatial resolution of the gridded dataset limit its ability to accurately capture regional variations. To further substantiate this conclusion, we conducted a quantitative evaluation of the CN05.1 gridded precipitation dataset. CN05.1 has a spatial resolution of 0.25° × 0.25°, with only six grid cells covering the study basin. Basin-averaged daily rainfall was derived for both station observations and gridded data using the Thiessen polygon method. The evaluation results show that CN05.1 exhibits noticeable discrepancies relative to gauge observations, with MAE = 3.97 mm, RMSE = 8.69 mm, and NSE = 0.40. The basin-wide mean bias is −0.255 mm (bias ratio −2.9%), indicating a slight overall underestimation. In addition, the mean bias at mountainous stations (−0.48 mm) is substantially larger than that at plain stations (−0.10 mm), reflecting the inability of coarse-resolution grids to capture orographic precipitation enhancement.
Considering the spatial continuity of rainfall, we further computed Pearson correlation coefficients for all 15 pairwise combinations among the six stations/grids using annual mean precipitation from 2005 to 2020. The average spatial correlation among the gauge stations is 0.911, whereas CN05.1 shows a much higher average correlation of 0.975. This inflated correlation indicates a pronounced smoothing effect: the coarse grid spacing suppresses spatial variability and causes neighboring grid cells to become overly similar, thereby artificially increasing spatial correlation.
These results confirm that although CN05.1 exhibits internally high spatial consistency, its substantial bias and lower predictive performance are mainly attributable to its coarse spatial resolution, which fails to represent the true spatial heterogeneity and localized rainfall features of the study basin. In contrast, traditional observational data more effectively capture spatial rainfall variability within the basin. Moreover, rainfall and evaporation data play a non-negligible role in improving runoff prediction accuracy in this region.
Figure 8. Bar Chart of NSE Evaluation Indicators for Model Runoff Prediction Results.
Figure 8. Bar Chart of NSE Evaluation Indicators for Model Runoff Prediction Results.
Sustainability 17 11120 g008
Based on multiple evaluation metrics, this chapter comprehensively evaluated and compared the performance of four data-driven models—SVR, XGBoost, GRU, and Seq2seq—in daily runoff prediction. The results show that the Seq2seq model exhibited the best overall performance, with Scheme Seq2seq-3 achieving an NSE of 0.859, significantly outperforming the other models. The GRU and XGB models demonstrated comparable accuracy, with NSE values of 0.846 and 0.844, respectively, while the SVR model performed the worst, with an NSE of 0.723.
Through comparative analysis of different input schemes, it was found that all models achieved the highest NSE values when using the input data from Scheme 3, followed by Scheme 2, while Schemes 4 and 5 showed relatively poor predictive performance. These findings indicate that, in this study area, observed data outperform gridded datasets, and the inclusion of observed runoff data plays a crucial role in enhancing the predictive capability of data-driven models.
In addition to predictive performance, we also evaluated the computational practicality of the data-driven models, as operational flood forecasting requires models that are both accurate and efficient. All experiments were conducted on a standard workstation. For a dataset of approximately 5000 time-series samples, the GRU model required about 36 s to complete 100 training epochs, while the Seq2seq model required around 48 s due to its encoder–decoder architecture. Despite the moderately higher training cost, the Seq2seq model’s improved multi-step prediction accuracy suggests that its computational demands remain acceptable for practical hydrological forecasting applications.

3.3. Runoff Simulation Results of Physically Data-Driven Hybrid Models

  • Integration of Physical Model Outputs as Additional Inputs to Data-Driven Models
When the outputs of the physical models (Xinanjiang and SWAT) were incorporated as additional features into the data-driven models, the runoff prediction accuracy improved significantly.
  • Integration with the Xinanjiang model:
When the surface runoff data generated by the Xinanjiang model were integrated, the Seq2seq model (Scheme XAJ-Seq2seq-2) achieved a validation NSE of 0.912, representing a substantial improvement over the best pre-integration scheme (Seq2seq-3, NSE = 0.859). This demonstrates that intermediate variables provided by the physical model, which directly reflect the runoff generation process, serve as highly valuable prior information for data-driven models.
  • Integration with the SWAT model:
Similarly, after incorporating the soil moisture data from the SWAT model, the Seq2seq model (Scheme SWAT-Seq2seq-1) also achieved a validation NSE of 0.912. This finding indicates that the inclusion of a key physical variable—basin water storage status (soil moisture)—helps the model more accurately determine runoff responses following rainfall events.
Overall, the success of the feature-fusion strategy verifies that physical models can provide valuable internal hydrological state information that data-driven models cannot directly extract from raw observations, thereby effectively enhancing prediction accuracy.
2.
Evaluation of the Physics-Constrained Strategy
Model performance also improved through the introduction of synthetic datasets and physically constrained loss functions, particularly in the simulation of extreme events, as summarized in Table 7.
  • Overall accuracy improvement:
Under the optimal scheme (PG-Seq2seq-4), the validation NSE reached 0.898. Although this value is slightly lower than that achieved by the feature-fusion strategy, it is still significantly higher than that of the standalone data-driven model.
  • Improved flood-peak fitting:
Comparison of flood hydrographs revealed that, although the physics-constrained models did not exhibit absolute superiority in overall NSE, they demonstrated excellent capability in capturing flood peaks—especially the highest peaks during the validation period—where predicted values closely matched observations. This finding directly confirms that the designed synthetic rainfall dataset and rainfall-constraint loss function effectively enhance the model’s responsiveness and accuracy in simulating high-flow events.
A comparison of six optimal hybrid schemes—including those integrating the Xinanjiang model (XAJ-Seq2seq-2 and XAJ-Seq2seq-3), the SWAT model (SWAT-Seq2seq-1 and SWAT-Seq2seq-2), and the physics-guided models (PG-Seq2seq-1 and PG-Seq2seq-4)—was conducted, focusing on the flood-period prediction results, as illustrated in Figure 9.
As shown in the figure, although the overall accuracy of the physics-guided models was slightly lower than that of the other schemes, the introduction of synthetic rainfall datasets and rainfall-constraint loss functions enabled the two hybrid models, PG-Seq2seq-1 and PG-Seq2seq-4, to exhibit superior performance in fitting runoff peaks. In particular, during the validation period, these models more accurately reproduced the highest daily runoff peaks. This finding indicates that the synthetic rainfall dataset and rainfall-constraint loss function effectively enhance the model’s responsiveness to high-flow events.

3.4. Interpretability Analysis

Using SHAP analysis, this study investigated the internal decision-making mechanisms of different hybrid models in runoff prediction. In addition to the qualitative visualization provided by the Global SHAP Contribution Plot (Figure 10), we further summarized the relative importance of key predictors by reporting their mean absolute SHAP values across samples, enabling a more objective comparison of feature importance.
  • Feature-Fusion Models:
In the Seq2seq model that incorporated runoff-generation data from the Xinanjiang model (XAJ-Seq2seq-2), the most influential feature was the simulated runoff generation at time t − 1, with a mean |SHAP| value of 0.44. In the model integrating simulated discharge, the simulated runoff at time t − 1 showed the highest importance (0.42).
In contrast, raw rainfall and historical runoff features exhibited lower contributions, with mean |SHAP| values ranging from 0.25 to 0.30. This indicates that the hybrid models successfully shifted their decision focus toward more physically meaningful hydrological variables.
In the SWAT-Seq2seq-1 model, rainfall at time t − 1 remained the most significant feature (mean |SHAP| = 0.35), followed by soil moisture at time t − 1, which reflects the hydrological principle that runoff generation depends jointly on rainfall occurrence and antecedent wetness.
2.
Physics-Constrained Models:
In the physics-constrained models (PG-Seq2seq-1 and PG-Seq2seq-4), rainfall at time t − 1 continued to be the most influential feature. An interesting observation is that, due to the inclusion of numerous artificially generated drought scenarios during training, the relative importance of the evaporation feature increased and showed a predominantly negative contribution—higher evaporation corresponded to lower runoff. This finding demonstrates that, through physical constraints, the model was guided to focus on key hydrological processes under specific physical conditions (e.g., drought periods), thereby making its behavior more consistent with physical laws. This change in feature importance is fully consistent with hydrological theory, as evapotranspiration becomes the dominant control on soil-water depletion and streamflow recession under drought conditions.

4. Conclusions

This study conducted a systematic investigation into hybridizing process-driven hydrological models and data-driven deep learning approaches for runoff forecasting. By designing, implementing, and comparing two distinct integration philosophies, this work provides a nuanced perspective on the strengths and applications of such synergistic models. The primary conclusions are as follows:
  • At the baseline model level, both process-driven models (Xinanjiang and SWAT) and data-driven models (particularly Seq2seq) achieved satisfactory runoff forecasting performance in the study area. Among them, the Seq2seq model performed best when combined with high-quality observed data, confirming the strong potential of deep learning in hydrological time-series forecasting.
  • The two proposed hybrid modeling strategies—physical feature fusion and physics-constrained embedding—significantly improved runoff prediction accuracy. The feature-fusion strategy yielded the highest overall accuracy (NSE = 0.912), while the physics-constrained strategy demonstrated unique advantages in simulating extreme events such as flood peaks and base flows.
  • The interpretability analysis not only validated the rationality of model decision-making (e.g., dependence on antecedent rainfall and runoff) but also revealed the effective mechanisms of the hybridization strategies. Feature fusion guided the model to rely on physically meaningful integrated variables, whereas physical constraints directed the model’s attention toward specific hydrological processes (e.g., evaporation during droughts), thereby enhancing model reliability and transparency.
In summary, this study demonstrates that integrating hydrological physical processes with data-driven approaches is an effective means to improve runoff forecasting performance. The proposed hybrid framework achieves a balance between high accuracy and interpretability, providing robust scientific support for refined watershed management and flood control decision-making. Looking forward, future research could explore the development of fully differentiable process-based models to allow for even more seamless end-to-end physics-informed machine learning integration, test the generalization of these hybrid frameworks to ungauged or data-scarce basins, and incorporate a wider range of physical constraints. Ultimately, such hybrid approaches represent a promising step toward a new generation of more accurate, reliable, and scientifically sound models for understanding and managing the Earth’s complex water systems.

Author Contributions

Conceptualization, M.Z. and B.L.; methodology, M.Z., T.Y. and H.G. (Hongbin Gu); software, M.Z. and H.G. (Huanghe Gu); validation, W.W. and Y.P.; formal analysis, L.P. and H.G. (Huanghe Gu); investigation, H.G. (Hongbin Gu) and M.Z.; data curation, L.P.; writing—original draft preparation, H.G. (Huanghe Gu) and L.P.; writing—review and editing, L.P. and H.G. (Huanghe Gu); visualization, L.P.; supervision, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Scientific Research Projects of the TB Hydropower Station under the China Huaneng Group Science and Technology Project (HNKJ22-H87).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Due to data policies of relevant agencies in China, the meteorological and hydrological observation data used in this study cannot be publicly released. Other datasets cited in the manuscript are available through their respective official websites. To enhance reproducibility, the synthetic datasets, model files, and modeling workflow description used in this study have been organized and made publicly accessible on GitHub: https://github.com/hanhandian/PG_Seq_4/tree/main (accessed on 2 November 2025). The models were implemented using Python 3.10, with key dependencies including SHAP (0.49.1), PyTorch (2.10.0), NumPy (1.26.4), Pandas (2.3.3), and scikit-learn (1.6.1). A complete list of dependencies is provided in the GitHub repository.

Acknowledgments

The authors would like to thank the technical team of the TB Hydropower Station for providing hydrological and meteorological data support, as well as the administrative assistance offered by China Huaneng Group Co., Ltd. During the preparation of this manuscript, the authors used ChatGPT (OpenAI, GPT-5 model) for the purposes of language polishing and clarity improvement. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Muzi Zhang, Tailun Yao, Hongbin Gu, Weiwei Wang, Linying Pan, Huanghe Gu, Ying Pei, and Baohong Lu declare the following: This study received funding from China Huaneng Group Co., Ltd. (Grant No. HNKJ22-H87). The funder and the TB Hydropower Station provided hydrological and meteorological data and administrative support. However, they had no involvement in the study design, data processing, data analysis, interpretation of the results, manuscript writing, or the decision to submit the manuscript for publication. The authors declare that there are no conflicts of interest. The Qiantang River Basin Center of Zhejiang Province is a public institution affiliated with the Department of Water Resources of Zhejiang Province, and is not a commercial company.

References

  1. Li, J.T.; Ai, P.; Xiong, C.S.; Song, Y.H. Coupled intelligent prediction model for medium- to long-term runoff based on teleconnection factors selection and spatial-temporal analysis. PLoS ONE 2024, 19, e0313871. [Google Scholar] [CrossRef]
  2. Kratzert, F.; Klotz, D.; Herrnegger, M.; Sampson, A.K.; Hochreiter, S.; Nearing, G.S. Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning. Water Resour. Res. 2019, 55, 11344–11354. [Google Scholar] [CrossRef]
  3. Montanari, A.; Koutsoyiannis, D. A blueprint for process-based modeling of uncertain hydrological systems. Water Resour. Res. 2012, 48, W09555. [Google Scholar] [CrossRef]
  4. Zhang, R.K.; Cheng, L.; Liu, P.; Huang, K.D.; Gong, Y.; Qin, S.J.; Liu, D.D. Effect of GCM credibility on water resource system robustness under climate change based on decision scaling. Adv. Water Resour. 2021, 158, 104063. [Google Scholar] [CrossRef]
  5. Ardabili, S.; Mosavi, A.; Dehghani, M.; Várkonyi-Kóczy, A.R. Deep Learning and Machine Learning in Hydrological Processes Climate Change and Earth Systems a Systematic Review. In Proceedings of the Engineering for Sustainable Future, Budapest, Hungary, 4–7 September 2019; pp. 52–62. [Google Scholar]
  6. Di Nunno, F.; de Marinis, G.; Granata, F. Short-term forecasts of streamflow in the UK based on a novel hybrid artificial intelligence algorithm. Sci. Rep. 2023, 13, 7036. [Google Scholar] [CrossRef]
  7. Wijayarathne, D.B.; Coulibaly, P. Identification of hydrological models for operational flood forecasting in St. John’s, Newfoundland, Canada. J. Hydrol.-Reg. Stud. 2020, 27, 100646. [Google Scholar] [CrossRef]
  8. Gong, J.F.; Yao, C.; Li, Z.J.; Chen, Y.F.; Huang, Y.C.; Tong, B.X. Improving the flood forecasting capability of the Xinanjiang model for small- and medium-sized ungauged catchments in South China. Nat. Hazards 2021, 106, 2077–2109. [Google Scholar] [CrossRef]
  9. Zhao, J.; Zhang, N.; Liu, Z.C.; Zhang, Q.; Shang, C.W. SWAT model applications: From hydrological processes to ecosystem services. Sci. Total Environ. 2024, 931, 172605. [Google Scholar] [CrossRef]
  10. Kumar, S.; Choudhary, M.K.; Thomas, T. A hybrid technique to enhance the rainfall-runoff prediction of physical and data-driven model: A case study of Upper Narmada River Sub-basin, India. Sci. Rep. 2024, 14, 26263. [Google Scholar] [CrossRef]
  11. Latif, S.D.; Ahmed, A.N. A review of deep learning and machine learning techniques for hydrological inflow forecasting. Environ. Dev. Sustain. 2023, 25, 12189–12216. [Google Scholar] [CrossRef]
  12. Lu, M.S.; Hou, Q.Y.; Qin, S.J.; Zhou, L.H.; Hua, D.; Wang, X.X.; Cheng, L. A Stacking Ensemble Model of Various Machine Learning Models for Daily Runoff Forecasting. Water 2023, 15, 1265. [Google Scholar] [CrossRef]
  13. Chang, C.W.; Dinh, N.T. Classification of machine learning frameworks for data-driven thermal fluid models. Int. J. Therm. Sci. 2019, 135, 559–579. [Google Scholar] [CrossRef]
  14. Fan, J.L.; Yue, W.J.; Wu, L.F.; Zhang, F.C.; Cai, H.J.; Wang, X.K.; Lu, X.H.; Xiang, Y.Z. Evaluation of SVM, ELM and four tree-based ensemble models for predicting daily reference evapotranspiration using limited meteorological data in different climates of China. Agric. For. Meteorol. 2018, 263, 225–241. [Google Scholar] [CrossRef]
  15. Cai, Y.P.; Guan, K.Y.; Lobell, D.; Potgieter, A.B.; Wang, S.W.; Peng, J.; Xu, T.F.; Asseng, S.; Zhang, Y.G.; You, L.Z.; et al. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agric. For. Meteorol. 2019, 274, 144–159. [Google Scholar] [CrossRef]
  16. Zewdie, G.K.; Lary, D.J.; Liu, X.; Wu, D.J.; Levetin, E. Estimating the daily pollen concentration in the atmosphere using machine learning and NEXRAD weather radar data. Environ. Monit. Assess. 2019, 191, 418. [Google Scholar] [CrossRef]
  17. Kiziloz, B. Prediction of daily failure rate using the serial triple diagram model and artificial neural network. Water Supply 2022, 22, 7040–7058. [Google Scholar] [CrossRef]
  18. Ghobadi, F.; Kang, D. Multi-Step Ahead Probabilistic Forecasting of Daily Streamflow Using Bayesian Deep Learning: A Multiple Case Study. Water 2022, 14, 3672. [Google Scholar] [CrossRef]
  19. Zhang, L.; Jiang, Z.Q.; He, S.S.; Duan, J.F.; Wang, P.F.; Zhou, T. Study on Water Quality Prediction of Urban Reservoir by Coupled CEEMDAN Decomposition and LSTM Neural Network Model. Water Resour. Manag. 2022, 36, 3715–3735. [Google Scholar] [CrossRef]
  20. Park, K.; Jung, Y.; Seong, Y.; Lee, S. Development of Deep Learning Models to Improve the Accuracy of Water Levels Time Series Prediction through Multivariate Hydrological Data. Water 2022, 14, 469. [Google Scholar] [CrossRef]
  21. Xiang, Z.R.; Yan, J.; Demir, I. A Rainfall-Runoff Model with LSTM-Based Sequence-to-Sequence Learning. Water Resour. Res. 2020, 56, e2019WR025326. [Google Scholar] [CrossRef]
  22. Nayak, P.C.; Venkatesh, B.; Krishna, B.; Jain, S.K. Rainfall-runoff modeling using conceptual, data driven, and wavelet based computing approach. J. Hydrol. 2013, 493, 57–67. [Google Scholar] [CrossRef]
  23. Lyubchich, V.; Newlands, N.K.; Ghahari, A.; Mahdi, T.; Gel, Y.R. Insurance risk assessment in the face of climate change: Integrating data science and statistics. Wiley Interdiscip. Rev.-Comput. Stat. 2019, 11, e1462. [Google Scholar] [CrossRef]
  24. Liu, C.S.; Xie, T.N.; Li, W.Z.; Hu, C.H.; Jiang, Y.Q.; Li, R.X.; Song, Q.K. Research on machine learning hybrid framework by coupling grid-based runoff generation model and runoff process vectorization for flood forecasting. J. Environ. Manag. 2024, 364, 121466. [Google Scholar] [CrossRef]
  25. Ahmad, I.; Farooq, R.; Ashraf, M.; Waseem, M.; Shangguan, D.H. Improving flood hazard susceptibility assessment by integrating hydrodynamic modeling with remote sensing and ensemble machine learning. Nat. Hazards 2025, 121, 7839–7868. [Google Scholar] [CrossRef]
  26. Ullah, A.; Haider, S.; Farooq, R. Sensitivity analysis of a 2D flood inundation model. A case study of Tous Dam. Environ. Earth Sci. 2024, 83, 213. [Google Scholar] [CrossRef]
  27. Syed, Z.; Mahmood, P.; Haider, S.; Ahmad, S.; Jadoon, K.Z.; Farooq, R.; Syed, S.; Ahmad, K. Short-long-term streamflow forecasting using a coupled wavelet transform-artificial neural network (WT-ANN) model at the Gilgit River Basin, Pakistan. J. Hydroinform. 2023, 25, 881–894. [Google Scholar] [CrossRef]
  28. Zhang, J.Q.; Li, J.; Zhao, H.Y.Z.; Wang, W.; Lv, N.; Zhang, B.W.; Liu, Y.; Yang, X.Y.; Guo, M.J.; Dong, Y.H. Impact Assessment of Coupling Mode of Hydrological Model and Machine Learning Model on Runoff Simulation: A Case of Washington. Atmosphere 2024, 15, 1461. [Google Scholar] [CrossRef]
  29. Wang, S.; Peng, H. Multiple spatio-temporal scale runoff forecasting and driving mechanism exploration by K-means optimized XGBoost and SHAP. J. Hydrol. 2024, 630, 130650. [Google Scholar] [CrossRef]
  30. Mfwango, L.H.; Ayenew, T.; Mahoo, H.F. Impacts of climate and land use/cover changes on streamflow at Kibungo sub-catchment, Tanzania. Heliyon 2022, 8, e11285. [Google Scholar] [CrossRef]
  31. Huang, T.T.; Liu, Y.; Jia, Z.F.; Zou, J.; Xiao, P.Q. Applicability of attribution methods for identifying runoff changes in changing environments. Sci. Rep. 2024, 14, 26100. [Google Scholar] [CrossRef]
  32. Yuan, L.; Sinshaw, T.; Forshay, K.J. Review of Watershed-Scale Water Quality and Nonpoint Source Pollution Models. Geosciences 2020, 10, 25. [Google Scholar] [CrossRef]
  33. Ghobadi, F.; Kang, D. Application of Machine Learning in Water Resources Management: A Systematic Literature Review. Water 2023, 15, 620. [Google Scholar] [CrossRef]
  34. Yin, C.L.; Zhu, Y.F.; Fei, J.L.; He, X.Z. A Deep Learning Approach for Intrusion Detection Using Recurrent Neural Networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
  35. Pan, L. Comparison of Kernel Functions and Parameter Selection of SVM Classification Algorithms. Master’s Thesis, University of California, Los Angeles, CA, USA, 2023. [Google Scholar]
  36. Gizaw, M.S.; Gan, T.Y. Regional Flood Frequency Analysis using Support Vector Regression under historical and future climate. J. Hydrol. 2016, 538, 387–398. [Google Scholar] [CrossRef]
  37. Wu, J.; Liu, H.; Wei, G.Z.; Song, T.Y.; Zhang, C.; Zhou, H.C. Flash Flood Forecasting Using Support Vector Regression Model in a Small Mountainous Catchment. Water 2019, 11, 1327. [Google Scholar] [CrossRef]
  38. Chen, T.Q.; Guestrin, C.; Assoc Comp, M. XGBoost: A Scalable Tree Boosting System. In Proceedings of the KDD’16: Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  39. Rondinone, M.; Dal Sasso, S.F.; Aung, H.H.; Contillo, L.; Dimola, G.; Schiattarella, M.; Fiorentino, M.; Telesca, V. Assessing Flood and Landslide Susceptibility Using XGBoost: Case Study of the Basento River in Southern Italy. Appl. Sci. 2025, 15, 5290. [Google Scholar] [CrossRef]
  40. Sanders, W.; Li, D.F.; Li, W.Z.; Fang, Z.N. Data-Driven Flood Alert System (FAS) Using Extreme Gradient Boosting (XGBoost) to Forecast Flood Stages. Water 2022, 14, 747. [Google Scholar] [CrossRef]
  41. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
  42. Karamvand, A.; Hosseini, S.A.; Azizi, S.A. Enhancing streamflow simulations with gated recurrent units deep learning models in the flood prone region with low-convergence streamflow data. Phys. Chem. Earth 2024, 136, 103737. [Google Scholar] [CrossRef]
  43. Wang, W.; Gao, J.; Liu, Z.; Li, C.Q. A hybrid rainfall-runoff model: Integrating initial loss and LSTM for improved forecasting. Front. Environ. Sci. 2023, 11, 1261239. [Google Scholar] [CrossRef]
  44. Ghaith, M.; Siam, A.; Li, Z.; El-Dakhakhni, W. Hybrid Hydrological Data-Driven Approach for Daily Streamflow Forecasting. J. Hydrol. Eng. 2020, 25, 04019063. [Google Scholar] [CrossRef]
  45. Zhang, C.C.; Xue, H.Z.; Dong, G.T.; Jing, H.T.; He, S. Runoff Estimation Based on Hybrid-Physics-Data Model. In Proceedings of the XXIV ISPRS Congress: Imaging Today, Foreseeing Tomorrow, Commission III, Nice, France, 6–11 June 2022; pp. 347–352. [Google Scholar]
  46. Jiang, Z.F.; Lu, B.H.; Zhou, Z.G.; Zhao, Y.R. Comparison of Process-Driven SWAT Model and Data-Driven Machine Learning Techniques in Simulating Streamflow: A Case Study in the Fenhe River Basin. Sustainability 2024, 16, 6074. [Google Scholar] [CrossRef]
  47. Gao, S.; Zhang, S.; Huang, Y.F.; Han, J.C.; Zhang, T.; Wang, G.Q. A hydrological process-based neural network model for hourly runoff forecasting. Environ. Model. Softw. 2024, 176, 106029. [Google Scholar] [CrossRef]
Figure 1. River network and station distribution in the catchment upstream of the Songxi Hydrological Station.
Figure 1. River network and station distribution in the catchment upstream of the Songxi Hydrological Station.
Sustainability 17 11120 g001
Figure 2. Internal Structure of the GRU.
Figure 2. Internal Structure of the GRU.
Sustainability 17 11120 g002
Figure 3. Structure of the GRU-Seq2seq-Attention Model.
Figure 3. Structure of the GRU-Seq2seq-Attention Model.
Sustainability 17 11120 g003
Figure 4. Overall workflow of the proposed hybrid runoff-forecasting framework.
Figure 4. Overall workflow of the proposed hybrid runoff-forecasting framework.
Sustainability 17 11120 g004
Figure 5. Framework of the physically constrained data-driven runoff forecasting model.
Figure 5. Framework of the physically constrained data-driven runoff forecasting model.
Sustainability 17 11120 g005
Figure 6. Daily Runoff Simulation Results of the SWAT Model during the Validation Period.
Figure 6. Daily Runoff Simulation Results of the SWAT Model during the Validation Period.
Sustainability 17 11120 g006
Figure 9. Comparison of Multiple Hybrid Model Results during the Flood Period.
Figure 9. Comparison of Multiple Hybrid Model Results during the Flood Period.
Sustainability 17 11120 g009
Figure 10. Global SHAP Contribution Plot of Different Coupled Models.
Figure 10. Global SHAP Contribution Plot of Different Coupled Models.
Sustainability 17 11120 g010
Table 1. Different schemes integrating the Xinanjiang model.
Table 1. Different schemes integrating the Xinanjiang model.
ModelSoil MoistureSurface RunoffOutlet DischargeScheme ID
GRU XAJ-GRU-1
XAJ-GRU-2
XAJ-GRU-3
Seq2seq XAJ-Seq-1
XAJ-Seq-2
XAJ-Seq-3
Note: “√” indicates that the corresponding variable is included in the scheme.
Table 2. Different schemes integrating the SWAT model.
Table 2. Different schemes integrating the SWAT model.
ModelSoil MoistureSurface RunoffOutlet DischargeScheme ID
GRU SWAT-GRU-1
SWAT-GRU-2
SWAT-GRU-3
Seq2seq SWAT-Seq-1
SWAT-Seq-2
SWAT-Seq-3
Note: “√” indicates that the corresponding variable is included in the scheme.
Table 3. Different schemes based on physical mechanisms.
Table 3. Different schemes based on physical mechanisms.
ModelRainfall Distribution Type 1Rainfall Distribution Type 2Rainfall Distribution Type 3Scheme ID
GRU PG-GRU-1
PG-GRU-2
PG-GRU-3
PG-GRU-4
Seq2seq PG-Seq-1
PG-Seq-2
PG-Seq-3
PG-Seq-4
Note: “√” indicates that the corresponding variable is included in the scheme.
Table 4. Basin Daily Runoff Simulation Results.
Table 4. Basin Daily Runoff Simulation Results.
ModelPeriodEvaluation Index
NSEMAERMSE
XinanjiangCalibration0.82916.79839.670
Validation0.84016.77743.429
SWATCalibration0.80618.78844.547
Validation0.82517.56846.012
Table 5. Different Model Schemes.
Table 5. Different Model Schemes.
ModelRainfallEvaporationMeteorological FeaturesHistorical RunoffScheme ID
SVR SVR-1
SVR-2
SVR-3
SVR-4
SVR-5
XGB XGB-1
XGB-2
XGB-3
XGB-4
XGB-5
GRU GRU-1
GRU-2
GRU-3
GRU-4
GRU-5
Seq2seq Seq2seq-1
Seq2seq-2
Seq2seq-3
Seq2seq-4
Seq2seq-5
Note: “√” indicates that the corresponding variable is included in the scheme.
Table 6. Runoff Prediction Accuracy Evaluation of the Best Schemes for Different Models—Part I.
Table 6. Runoff Prediction Accuracy Evaluation of the Best Schemes for Different Models—Part I.
Model SchemeTraining PeriodValidation Period
NSERMSEMAENSERMSEMAE
SVR-30.76948.16514.3520.72362.31515.844
XGB-30.88733.65213.6290.84446.10116.001
GRU-30.92527.33712.9770.84646.42614.973
Seq2seq-30.93725.19711.4620.85944.39313.242
Table 7. Runoff Prediction Accuracy Evaluation of the Best Schemes for Different Models—Part II.
Table 7. Runoff Prediction Accuracy Evaluation of the Best Schemes for Different Models—Part II.
Model SchemeTraining PeriodValidation Period
NSERMSEMAENSERMSEMAE
GRU-30.92527.33712.9770.84646.42614.973
Seq2seq-30.93725.19711.4620.85944.39313.242
XAJ-GRU-30.87934.87514.4360.88140.80415.101
XAJ-Seq2seq-20.91129.79612.3690.91235.07112.665
SWAT-GRU-30.90830.31815.0460.88340.43116.516
SWAT-Seq2seq-10.91030.04313.3830.91235.06715.368
PG-GRU-40.92926.65814.0520.87242.32617.338
PG-Seq2seq-40.92427.63311.6140.89839.07513.025
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, M.; Yao, T.; Gu, H.; Wang, W.; Pan, L.; Gu, H.; Pei, Y.; Lu, B. A Hybrid Runoff Forecasting Framework Integrating Hydrological Physics and Data-Driven Models. Sustainability 2025, 17, 11120. https://doi.org/10.3390/su172411120

AMA Style

Zhang M, Yao T, Gu H, Wang W, Pan L, Gu H, Pei Y, Lu B. A Hybrid Runoff Forecasting Framework Integrating Hydrological Physics and Data-Driven Models. Sustainability. 2025; 17(24):11120. https://doi.org/10.3390/su172411120

Chicago/Turabian Style

Zhang, Muzi, Tailun Yao, Hongbin Gu, Weiwei Wang, Linying Pan, Huanghe Gu, Ying Pei, and Baohong Lu. 2025. "A Hybrid Runoff Forecasting Framework Integrating Hydrological Physics and Data-Driven Models" Sustainability 17, no. 24: 11120. https://doi.org/10.3390/su172411120

APA Style

Zhang, M., Yao, T., Gu, H., Wang, W., Pan, L., Gu, H., Pei, Y., & Lu, B. (2025). A Hybrid Runoff Forecasting Framework Integrating Hydrological Physics and Data-Driven Models. Sustainability, 17(24), 11120. https://doi.org/10.3390/su172411120

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop