Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks

Galán, Victor; Navas, Rafael; Zubelzu, Sergio

doi:10.3390/su18136381

Open AccessArticle

Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks

by

Victor Galán

¹,

Rafael Navas

²

and

Sergio Zubelzu

^1,*

¹

Departamento de Ingeniería Agroforestal, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, 28040 Madrid, Spain

²

Departamento del Agua, CENUR—Litoral Norte, Universidad de la República, Salto 50000, Uruguay

^*

Author to whom correspondence should be addressed.

Sustainability 2026, 18(13), 6381; https://doi.org/10.3390/su18136381 (registering DOI)

Submission received: 18 May 2026 / Revised: 10 June 2026 / Accepted: 15 June 2026 / Published: 23 June 2026

Download

Browse Figures

Versions Notes

Abstract

Accurate streamflow prediction in small catchments remains challenging due to their rapid response times, threshold-driven behaviors, and high spatial heterogeneity. This study develops and evaluates a novel modeling approach combining physics-informed feature selection with machine learning algorithms. Overall, 1825 model configurations were tested across fifteen algorithms (including Random Forest, XGBoost, LightGBM, CatBoost, Support Vector Machines, and deep learning methods) using multiple physics-informed input structures based on classical rainfall–runoff theory and mass balance conservation. Models were evaluated for predicting minimum, average, and maximum daily water levels and discharge. Results demonstrate that models structured around Green-Ampt infiltration assumptions consistently outperformed alternative configurations, with Random Forest achieving good performance for water level predictions. Causal models outperformed autoregressive approaches while the residuals analysis showed limitations in predicting extreme values. Feature importance analysis revealed that channel and catchment morphology and initial soil moisture conditions were dominant predictors, aligning with hydrological process understanding.

Keywords:

hydrology; streamflow; machine learning; physically based models

1. Introduction

Accurate prediction of streamflow and outflow at the catchment scale is a central problem in hydrology, underpinning water resources management, flood forecasting, drought assessment, and ecohydrological studies. Small catchments pose persistent challenges due to their strong nonlinearity, rapid hydrological response and sensitivity to local heterogeneities. While traditionally two broad and complementary modeling paradigms have emerged for catchment-scale outflow prediction, i.e., physically based and data-driven models, physics-informed data-driven approaches have recently emerged to bridge the gap between these paradigms by integrating physical knowledge into machine learning frameworks [1]. Physically based hydrological models aim to represent catchment processes using governing equations derived from conservation laws of mass, momentum, and energy [2]. The main advantage of physical and conceptual models lies in their interpretability and consistency with hydrological theory.

Data-driven approaches seek to model the relationship between hydrometeorological inputs and catchment outflow directly from data [3]. The primary strength of data-driven models lies in their flexibility and ability to capture complex nonlinear relationships. Despite these advantages, purely data-driven models generate controversy among scientists [4]. They typically lack physical interpretability, do not guarantee consistency with hydrological laws and on many occasions the relationships between inputs and outputs are spurious [5]. Their extrapolation capability beyond the training domain is limited, and they may produce physically implausible results [6]. In such contexts, purely data-driven models tend to struggle to adequately represent the full range of hydrological variability and frequently fail to capture the most hydrologically relevant states, namely extreme low-flow and high-flow events.

Seeking to overcome those issues, physics-informed models aim to retain the predictive power and flexibility of data-driven approaches while improving interpretability, robustness, and generalization. The number of papers published presenting physics-informed approaches has largely increased in recent years, presenting a wide variety of approaches. For example, Parisouj et al. [7] combined data-driven algorithms with HEC-HMS finding that pure ML algorithms (LSTM) performed better than the combined architecture; Lu et al. [8] fed LSTM networks on outputs from the physically based PRMS-IV model, finding that the hybrid modeling better predicted the outflow when the variability in the data increased; Zhong et al. [9,10] proposed a complex modelling approach where data-driven approaches and physics interact at different levels, from runoff estimation to feed bucket-based hydrological models to estimate the parameters involved in the kinematic-wave solution for the Muskingum–Cunge integration of the Saint-Venant equations; Zhao et al. [11] used time-series modelling to predict streamflow data simulated with physical models using a mixed (records-estimates) dataset; Zhang et al. [12] fed data-driven approaches incorporating intermediate variables simulated from physical models seeking to incorporate explicit physical meanings; and Liu et al. [13] also used the outputs of physical modelling to feed LSTM networks. A complete review of the latest trends in physics-informed models for hydrology can be found in Xu et al. [14].

Previous approaches mostly combine physical and data-driven approaches by blocking models from both domains by feeding ones on the inputs of the others or using ones to apparently improve the outputs from the others. The work presented in this manuscript departs from such approaches by forcing data-driven algorithms to replicate existing theoretical models under the assumption that physical reality is captured by physical conceptual models. This approach should increase the model’s reliability and the potential replicability of episodes scarcely represented in the data series.

On this conceptual ground, the target of this manuscript is the so-called rainfall–streamflow process proposing a physics-informed data-driven framework for predicting outflow in small catchments where machine learning algorithms do not complement but rather replicate physical causality, seeking to maintain theoretical robustness and hydrological consistency while achieving high predictive performance.

2. Materials and Methods

Our ambition is to build machine learning algorithms mimicking physical relationships as given by existing physical theories for modeling event-based hydrological processes. To implement this idea, the following steps were followed: (1) conceptually define relationships between inputs and outputs following physical laws; (2) gather the inputs required for building the machine learning algorithms, including time series data and physical parameters involved; (3) define the causal relationships between the inputs and the outputs based on the conceptual relationships inferred and the data availability; and finally, (4) find the optimal machine learning algorithms.

2.1. General Framework

The theoretical framework to define the data-driven models based on the general mass conservation law within the catchment is given by Equations (1) and (2).

P(t) = R(t) + I(t),

(1)

dS·dt⁻¹ = I(t) − ETk(t),

(2)

where S is the system internal variation (assumed in this context to be the soil moisture variation Δθ), P is precipitation, R is surface runoff, ETk is evapotranspiration, I is infiltration and t is time.

On the previous general framework defined by Equations (1) and (2), a set of expressions presented below were built that define conceptual feature groups and causal relationships rather than physically complete governing equations.

Given that surface runoff ultimately produces the catchment’s downstream-end point outflow (Q), Equations (1) and (2) can be manipulated to derive theoretical relationships between the outflow and the independent variables as follows (Equations (3a) and (3b)).

Q = f(P, I),

(3a)

Q = f(P, Δθ, ETk),

(3b)

Equations (3a) and (3b) are expanded to deduce the causal relationships used to build data-driven approaches in this manuscript as follows.

1. Starting with Equation (3a), the infiltration can be simulated through existing physical or conceptual theories such as the Green-Ampt [15] or the curve number (SCS) [16] ones. Following this idea, Equation (4) (Green-Ampt model) and Equation (5) (CN model) present the theoretical relationships customized incorporating the parameters involved in each model.

I = f(θ₀, ks, θs, θr, α, m, τf),

(4)

I = f(CN, θ₀),

(5)

where θ₀ is the initial soil water content and ks, θs, θr, α, and m are the parameters of van Genuchten [17] and Mualem [18] equations for the conductivity and water retention curves and τf is the suction head wetting front.

Combining Equation (3a) with Equations (4) and (5) yields the conceptual relationships expressed in Equations (6) and (7), corresponding to the Green–Ampt and CN approaches, respectively. These equations describe the relationships between the catchment outflow and the primary variables that serve as the basis for the data-driven models developed in this study.

Q = f(P, θ₀, ks, θs, θr, τf, ϕ),

(6)

Q = f(P, CN, θ₀, ϕ),

(7)

In Equations (6) and (7), the function ϕ accounts for the processes involved in the transfer from runoff generation to flow routing at the catchment outlet. Assuming a Saint-Venant kinematic-wave approach for flow routing, ϕ can be conceptually represented by Expression (8).

ϕ = f(n, L, z_M, z_m, w),

(8)

where n is the Manning’s roughness coefficient, L is the channel length and z_M and z_m are the catchment’s highest and lowest elevations, while w accounts for the channel area.

2. Proceeding from Equation (3b), the dependencies between the evapotranspiration and the independent variables are expressed in Equations (9a) and (9b).

ETk = f(SR, wv, T, HR, θfc, θpwp),

(9a)

ETk = f(ET0, θfc, θpwp),

(9b)

Here, ET0 represents potential evapotranspiration, SR is solar radiation, wv is wind velocity, T is temperature, HR is relative humidity and θfc and θpwp are the soil moisture contents at field capacity and permanent wilting point, respectively.

Combining Equations (9a) and (9b) with Equation (3b) yields Expressions (10a) and (10b), which capture the physical causal relationships between the catchment outflow and the associated physically relevant variables according to the framework defined by Equation (3b).

Q = f(P, Δθ, SR, wv, T, HR, θfc, θpwp, ϕ),

(10a)

Q = f(P, Δθ, ET0, θfc, θpwp, ϕ),

(10b)

2.2. Case Studies and Data Availability

A set of 20 catchments within the Spanish Ebro River basin was selected for this study (Table 1 provides the names of the rivers and the locations of the corresponding streamflow gauging stations, while Figure 1 shows the spatial distribution of the catchments). The selected catchments met the following criteria:

(a): Small watersheds that can be readily parameterized from a physical perspective, facilitating the analysis of isolated hydrological processes such as rainfall, infiltration, and runoff.
(b): Absence of internal storage structures, water abstractions, or inter-basin water transfers, thereby ensuring compliance with the law of mass conservation.
(c): Presence of a single, clearly identifiable main channel.
(d): Availability of sufficiently long and high-temporal-resolution historical records to enable the development of representative data-driven models.

To feed the models presented in Section 2.1, daily time-series data covering five hydrological seasons, from 1 October 2020 to 30 September 2025, were collected. In addition, several synthetic variables potentially relevant for hydrological modeling were derived (see Table 2).

Precipitation, water level, and outflow data were obtained from the SAIH Ebro service [19]. The remaining meteorological variables (temperature, relative humidity, wind velocity, solar radiation, and evapotranspiration) were collected from the nearest stations belonging to the SIAR [20], Catalonian [21], or Basque Country [22] monitoring networks. Soil moisture data were obtained from the SMAP product [23], with a temporal resolution of 3 h and a spatial resolution of 9 km. For each catchment, the grid point closest to the catchment centroid was selected. The meteorological data referred to in Table 2 downloaded from the different networks had been previously quality controlled by the corresponding institutions to ensure continuity and the absence of spurious values. Consequently, 1826 daily observations were available for each variable, resulting in a total of 36,520 daily records per feature across the 20 catchments considered in this study. Regarding soil moisture, the 3-hourly observations were averaged to obtain representative daily values.

Several synthetic variables were incorporated into the analysis to capture temporal patterns and variability that could improve the prediction of extreme events. These transformations were expected to reduce the influence of short-term fluctuations while preserving the information associated with extreme hydrological episodes.

In addition to the time-series data, physical and territorial information was processed to estimate the parameters required for the models presented in Section 2.1. Table 3 summarizes the resulting information, while a detailed description of the procedures used to derive these parameters can be found in [24]. The complete set of physical and territorial parameter values is provided in the Supplementary Materials.

2.3. Model Configuration

By combining the available data (Section 2.2) with the general model configurations defined in Section 2.1, a set of data-driven algorithms was developed to emulate the causal relationships described by Equations (6), (7), (10a) and (10b). Two different approaches were followed to construct the data-driven models:

1. Direct incorporation of physical relationships into the causal models according to Equations (6), (7), (10a) and (10b). This approach resulted in the set of models presented in Table 4.

2. Autoregressive models using (a) the entire set of catchments, and (b) clusters of catchments defined based on their physical and territorial parameters, following either the Green–Ampt approach (clusters created according using the Green–Ampt related variables presented in Table 2 to group the catchments) or the Curve Number method (clusters created using to CN related variables presented in Table 2 to group the catchments). In the second approach, the algorithm is constrained by physical criteria by grouping data according to the similarity of the parameters involved in each theoretical framework. The autoregressive models are displayed in Table 5.

2.4. Data-Driven Algorithms

The following algorithms have been used.

Linear Regression (LR): Seeking to model the relationship between input features and the target variable by fitting a linear equation that minimizes the sum of squared residuals.

Ridge Regression (RR): An extension of linear regression that adds L2 regularization by penalizing large coefficient values, which helps prevent overfitting when dealing with multicollinear features common in hydrological data.

Lasso Regression (LSR): Similar to Ridge but employs L1 regularization, which can shrink some coefficients exactly to zero, effectively performing automatic feature selection. This is particularly useful in streamflow modeling when dealing with numerous potential predictors, as it identifies the most relevant hydrological variables while discarding redundant ones.

Support Vector Machine with Linear Kernel (SVML): A machine learning algorithm that finds the optimal hyperplane separating data in feature space by maximizing the margin between predictions. The linear kernel assumes linear relationships between inputs and streamflow, similar to linear regression but with different optimization objectives and robust handling of outliers through support vectors.

Support Vector Machine with RBF Kernel (SVMR): Uses a radial basis function kernel that maps inputs into a higher-dimensional space, enabling the capture of complex nonlinear relationships between meteorological forcing and catchment response. The RBF kernel can model localized patterns in the data, making it suitable for threshold-driven hydrological processes characteristic of small catchments.

Random Forest (RF): An ensemble learning method that constructs multiple decision trees during training and outputs the average prediction across all trees. Each tree is trained on a bootstrap sample of the data with random feature subsets at each split, providing robust predictions, natural feature importance rankings, and the ability to capture nonlinear interactions between hydrological variables without extensive hyperparameter tuning.

Gradient Boosting (GB): An ensemble technique that builds trees sequentially, where each new tree corrects errors made by the previous ensemble. It combines weak learners (shallow trees) into a strong predictive model by optimizing a loss function through gradient descent, offering high predictive accuracy for streamflow forecasting when properly regularized to prevent overfitting.

Extreme Gradient Boosting (XGB): An optimized implementation of gradient boosting that incorporates regularization terms, handles missing data natively, and uses parallel processing for computational efficiency. It includes advanced features like tree pruning, weighted quantile sketch for approximate learning, and sparsity-aware algorithms, making it particularly effective for hydrological datasets with irregular measurements or gaps.

LightGBM (LGBM): A gradient boosting framework that uses histogram-based algorithms and grows trees leaf-wise rather than level-wise, achieving faster training speeds and lower memory usage. Its efficient handling of large datasets and categorical features makes it well-suited for catchment-scale modeling with high-resolution temporal data or when incorporating land use classifications.

CatBoost (CB): A gradient boosting algorithm specifically designed to handle categorical features without extensive preprocessing, using ordered boosting and novel techniques to reduce overfitting. It addresses prediction shift problems and can automatically process features like season, month, or catchment characteristics, which is valuable when incorporating non-numeric hydrological descriptors into streamflow models.

Autoregressive Integrated Moving Average (ARIMA): A time series forecasting method that combines three components: autoregression (AR) using past streamflow values, differencing (I) to achieve stationarity by removing trends, and moving average (MA) of past forecast errors. ARIMA models capture temporal dependencies and trends in streamflow sequences, making them suitable for short-term predictions, though they assume linear relationships and may struggle with complex nonlinear hydrological dynamics or incorporating exogenous meteorological variables.

Seasonal Autoregressive Integrated Moving Average (SARIMA): An extension of ARIMA that explicitly accounts for seasonal patterns by adding seasonal AR, differencing, and MA components with specific lags corresponding to the seasonal period. This is particularly relevant for streamflow modeling where annual cycles in precipitation, snowmelt, and evapotranspiration create recurring patterns, allowing the model to capture both short-term dynamics and long-term seasonal variations in catchment response.

Long Short-Term Memory network (LSTM): A specialized recurrent neural network architecture designed to learn long-term dependencies in sequential data through memory cells and gating mechanisms (input, forget, and output gates) that regulate information flow. LSTMs excel at capturing complex temporal patterns in hydrological time series, including prolonged influences like antecedent soil moisture conditions or delayed snowmelt contributions, while mitigating the vanishing gradient problem that limits standard recurrent networks in learning extended lag relationships between rainfall events and streamflow response.

Multi-Layer Perceptron network (MLP): A feedforward artificial neural network consisting of an input layer, one or more hidden layers with nonlinear activation functions, and an output layer, trained through backpropagation to minimize prediction error. MLPs can approximate complex nonlinear mappings between meteorological inputs and streamflow output through their layered architecture, though unlike LSTMs they do not inherently capture temporal dependencies and typically require manual feature engineering to incorporate time-lagged variables and antecedent conditions relevant to catchment hydrological memory.

Prior to model training, all predictor variables were standardized to have zero mean and unit variance. Model optimization was based on minimizing the MAE metric. Each algorithm was evaluated under the configurations defined in Section 2.2, and the best-performing models were selected for further fine-tuning. This was carried out using a systematic grid-search approach, in which a predefined set of hyperparameter combinations was independently evaluated for each machine learning algorithm (see Table 6 for a summary of the main hyperparameter tuning settings).

To evaluate model robustness and predictive capability, two different cross-validation strategies were used. The first approach employed a conventional random K-fold split, in which samples from different days were randomly assigned to training and validation subsets (this approach was used to evaluate the physically based models presented in Table 4). This implies that the models use input–output pairs sampled from randomly selected days, such that the date has no influence on model structure or predictions; consequently, the models are not sensitive to the temporal ordering of the observations. In each K-fold iteration, training and validation sets differ, allowing the model to learn from multiple training samples and be evaluated on different validation subsets.

The second approach used a TimeSeriesSplit strategy, where complete temporal sequences were preserved during training and testing in order to account for temporal autocorrelation and better represent realistic forecasting conditions (this approach was used to evaluate the autoregressive methods presented in Table 5). In this case, since time is a relevant factor in model construction, the split between training and validation datasets was defined to ensure there is no temporal overlap between them.

2.5. Performance Metrics

The following metrics were used for evaluating the performance of each model.

RMSE (Root Mean Square Error): Measures the square root of the average squared differences between predicted and observed streamflow values (Equation (11)).

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{o b s, i} - x_{s i m, i})}^{2}},

(11)

where x_obs,i and x_sim,i account for the ith records and simulations, respectively.

MAE (Mean Absolute Error): Calculates the average absolute difference between predicted and observed values (Equation (12)).

M A E = \frac{1}{n} \sum_{i = 1}^{n} |x_{o b s, i} - x_{s i m, i}|,

(12)

NSE (Nash–Sutcliffe Efficiency), which is a normalized statistic that compares the model’s predictive skill to the mean of observed values (x_avobs,i); see Equation (13).

N S E = 1 - \frac{\sum_{i = 1}^{n} {(x_{o b s, i} - x_{s i m, i})}^{2}}{\sum_{i = 1}^{n} {(x_{o b s, i} - x_{o b s, i}^{a v})}^{2}},

(13)

KGE (Kling–Gupta Efficiency): A comprehensive metric that decomposes model performance into correlation, bias, and variability components (Equation (14)).

K G E = 1 - \sqrt{{(r - 1)}^{2} + {(α - 1)}^{2} + {(β - 1)}^{2}},

(14)

where r is the linear correlation coefficient between observed and simulated values, α the ratio of standard deviations (variability ratio), and β is the ratio of means (bias ratio).

BIAS: Quantifies the systematic tendency of the model to over- or under-predict the variable, expressed as a percentage of the mean observations (Equation (15)).

B I A S % = \frac{\sum_{i = 1}^{n} (x_{s i m, i} - x_{o b s, i})}{\sum_{i = 1}^{n} x_{o b s, i}} \times 100,

(15)

MAPE (Mean Absolute Percentage Error): Expresses prediction errors as a percentage of observed values, providing a scale-independent measure useful for comparing performance across catchments with different magnitudes (Equation (16)).

M A P E % = \frac{100}{n} \sum_{i = 1}^{n} |\frac{x_{o b s, i} - x_{s i m, i}}{x_{o b s, i}}|,

(16)

After evaluating different strategies during the investigation, the MAE metric was selected to identify the best-performing models. Besides the remaining indicators presented in this manuscript, other more complex metrics (such as weighted combinations of different performance measures) were also analyzed; however, MAE was ultimately adopted due to its simplicity, its clear interpretation in relative terms, and its direct physical meaning in hydrological applications. Similarly, although it is well known that MAPE is highly sensitive to zero values, it can still complement the information provided by other indicators by expressing the average error relative to observed values.

3. Results

The models presented in Table 4 and Table 5 were implemented using the algorithms described in Section 2.4, resulting in 1825 model configurations. Table 7 presents the best-performing models in terms of minimum MAE for each target output (bold values indicate the best-performing approach for each output).

As evidenced by the results presented in Table 7, the approach defined in Equation (6) consistently outperforms the remaining theoretical approaches. The results in Table 7 also show that approaches based on Equations (6) and (7) yielded the best performance metrics, suggesting that models based on classical rainfall–runoff formulations (Equation (1)), rather than those based on mass-balance conservation (Equation (2)), provide better predictions of streamflow. It should be noted that the performance of the autoregressive models that group catchments using variables derived from the physical and conceptual models used in this study is relatively close to that of the best-performing models.

As also revealed by the data included in Table 7, better performance was achieved when predicting water level compared with discharged flow. Although both variables are causally related, water level is directly measured, whereas discharge is derived from water level. This may indicate uncertainties associated with discharge estimation, despite the fact that the data used in this study were obtained directly from a public repository.

Complementing the information presented in Table 7, Table 8 reports the full set of performance indicators for each of the best-performing models identified in Table 7.

In general, the RF algorithm demonstrated good performance in predicting water level (also supported by the Taylor diagram presented in Figure 2, which illustrates the relationship between correlation coefficients and standard deviation ratios). For water level estimates, the NSE indicates that the models explain 67–75% of the variance, while KGE values are particularly encouraging, suggesting well-balanced performance in terms of correlation, bias, and variability. RMSE and MAE values are relatively low, indicating reasonable absolute errors. MAPE values (21.73–41.62%) are moderate but are clearly affected by low observed values. In some cases, this is due to a large number of observations approaching zero (particularly when predicting flow or synthetic difference variables). The negative PBIAS values (−0.17 to −0.30%) indicate a slight systematic overestimation. By contrast, discharge predictions show considerably weaker performance, while synthetic variables yield the poorest overall results.

Once the best models had been identified and the fine-tuning strategy defined in Section 2.4 had been applied, model performance improved. Figure 3 presents the scatter plots and Figure 4, Figure 5 and Figure 6 show the residual characteristics of the fine-tuned models for water-level-related outputs.

In general terms, the residuals vs. predicted values plots indicate that the models do not exhibit a strong systematic bias across most prediction ranges. The dispersion of residuals increases as the predicted values become larger, which suggests the presence of heteroscedasticity. Consequently, the models appear to perform more accurately for low and medium values. The histogram of the residuals’ distribution shows that the residuals are strongly concentrated around zero, though the distribution exhibits noticeable asymmetry and heavier tails than expected under a normal distribution. In particular, the right tail is longer, suggesting the existence of occasional underestimations of extreme values (see Figure 7, Figure 8 and Figure 9 for examples). This deviation from normality is confirmed by the Q–Q plot. This pattern may indicate temporal autocorrelation or the occurrence of anomalous hydrological conditions that are more difficult for the model to reproduce. Furthermore, the presence of residual clustering suggests that some temporal dependencies remain unexplained by the predictor variables included in the model.

The feature importance analysis (Figure 10) reveals that five variables (including channel length, highest catchment elevation, average root-zone soil moisture, and surface soil moisture) explain most of the output variance. In particular:

(a): Channel length reduces the variability of the resulting data by 21.6%, 12.0%, and 10.3% after being used to split the sample.
(b): Highest catchment elevation reduces the variability of the resulting data by 14.8%, 11.2%, and 9.8% after being used to split the sample.
(c): Average surface soil moisture reduces the variability of the resulting data by 18.7%, 17.0%, and 19.2% after being used to split the sample.
(d): Average root-zone soil moisture reduces the variability of the resulting data by 12.2%, 16.1%, and 17.0% after being used to split the sample.

This set of variables suggests that both the transient response (represented by channel length and maximum catchment elevation) and the initial soil moisture conditions (represented by both soil moisture levels) are of great importance in determining the observed water levels. Meanwhile, the Green–Ampt parameters are represented solely by the wetting front suction head. Finally, precipitation is included among the most representative features in the maximum water level models, while being less relevant in the other two models.

4. Discussion

Hydrological modeling is moving toward approaches that integrate physical understanding with data-driven learning capabilities [30,31]. Our approach differs fundamentally from purely data-driven models by incorporating hydrological knowledge directly into the model structure through physics-informed input configurations. Rather than attempting to combine existing physical and data-driven models into ensemble frameworks, data-driven algorithms were constructed to mimic physical laws. This approach is conceptually aligned with recent work by Nazari et al. [32], who used neural networks to replicate the Saint-Venant equations using synthetic data, and Liang et al. [33], who assessed the capability of various machine learning algorithms to reproduce outputs from physically based numerical models.

The superior performance of models based on Equation (6) suggests that the rainfall–runoff transformation process, as conceptualized in classical infiltration theory, captures the dominant hydrological dynamics in our study catchment more effectively than approaches emphasizing the full water balance. The superiority of models shaped by physical principia (either casual or clustered auto-regressive) over pure autoregressive approaches observed in the case studies suggests pure temporal patterns do not fully capture the complexity of hydrological processes.

Delving into the causality observed in this work, the prominence of channel length and maximum elevation reflects the importance of catchment morphology in determining travel times and concentration of flow, fundamental concepts in hydrological response theory [34]. Advantageously, and in contrast with other inputs, these are easily determinable without requiring extensive field campaigns. The role of initial soil moisture conditions observed in this manuscript aligns with literature demonstrating that antecedent wetness exerts dominant control on runoff generation, particularly in small catchments [35,36]. However, the estimation of this parameter, unlike the aforementioned morphometric ones, is affected by greater uncertainty. The use of satellite products and catchment-average values derived from them can introduce this uncertainty; however, accuracy could be increased using products with greater spatial detail, such as on-site sensor networks. Notably, precipitation appears prominently only in the maximum water level model and is less influential for minimum and average water levels. This suggests that extreme events are more directly precipitation-driven, while baseflow and typical conditions are more strongly controlled by catchment storage and antecedent moisture states. This pattern is consistent with the understanding of hydrological processes, where event water dominates during floods while pre-event water sustains low flows [37].

The residual analysis exposes model limitations that merit attention for future improvements. The residuals highlight the challenge of capturing the full range of streamflow variability, particularly during extreme events [38], which complicates the prediction of potentially damaging episodes. While it is well known that RF algorithms tend to underestimate extremes because predictions are averages over many trees, this is a limitation largely recognized in the literature [39], particularly in small catchments. The rapid response times, threshold behaviors, high spatial heterogeneity, and limited gauge density that characterize headwater systems create difficulties for both physical and data-driven approaches [40,41].

The developed models better predict lower values during dry seasons, with some isolated overpredictions likely occurring because baseflow recession dynamics are difficult to capture with static catchment descriptors. Similarly, daily variability poses a limitation to model accuracy, which can be influenced by the potential prevalence of zero-values. The models did not performance efficiently in predicting synthetic variables (Dif_y and Dif_q), which represent daily amplitude or variability. These variables were designed to capture sub-daily dynamics and threshold behaviors characteristic of flashy, small catchment responses. Their poor predictability suggests that models trained on daily aggregated data struggle to resolve within-day variability, even when that variability is expressed as a derived daily metric.

Despite previous limitations, this manuscript is built on the assumption that grounding data-driven approaches in clear physical causal principles will help minimize the effects of extreme events on model accuracy, require smaller training datasets, and help identify the most important variables for predicting extreme events.

5. Conclusions

This study developed and evaluated physics-informed data-driven models for predicting water levels and discharge in small catchments by defining model architectures based on physical theories. The results obtained in this work showed that data-driven approaches built on classical rainfall–runoff relationships achieved superior performance compared to approaches based solely on mass balance conservation, and that physically conformed algorithms yielded better performance than pure autoregressive approaches. Furthermore, RF-based models achieved good performance for water level predictions, while discharge predictions showed considerably weaker performance.

The dominance of five key variables in explaining model outputs aligns well with hydrological process understanding: morphological characteristics (channel length and maximum elevation) control flow concentration and travel times, while soil moisture variables capture critical antecedent wetness conditions that govern runoff generation mechanisms. However, extreme events remain poorly captured, suggesting that current models must be improved for applications requiring accurate extreme event prediction. Future work should expand training datasets to better represent extremes, address the systematic underestimation of peak flows, incorporate sub-daily temporal resolution, and include uncertainty quantification. Moreover, testing model performance across multiple catchments with varying climatic, geological, and land cover characteristics would help assess the generalizability of these physics-informed approaches.

The research conducted on the selected case studies indicates that by structuring data-driven models around established hydrological principles while retaining the flexibility and efficiency of machine learning algorithms, improved predictive performance can be achieved while maintaining a degree of physical interpretability.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/su18136381/s1. The physical and territorial characteristics of the analyzed catchments are provided in the supplementary dataset SupplementaryInformation1.csv.

Author Contributions

Conceptualization, S.Z. and R.N.; methodology, S.Z. and V.G.; software, S.Z. and V.G.; validation, S.Z., R.N. and V.G.; formal analysis, R.N.; investigation, S.Z., R.N. and V.G.; resources, S.Z.; data curation, S.Z., R.N. and V.G.; writing—original draft preparation, S.Z., R.N. and V.G.; writing—review and editing, S.Z., R.N. and V.G.; visualization, V.G. and S.Z.; supervision, S.Z.; project administration, S.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is part of the project TED2021-131520B-C21, funded by MCIN/AEI/10.13039/501100011033 and the UE “NextGenerationEU”/PRTR.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors want to explicitly thank the Spanish Confedereación Hidrográfica del Ebro for sharing the data and its willingness to collaborate by providing the information required.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Adombi, A.V.D.P. Scientific machine learning in hydrology: A unified perspective. Earth Sci. Inform. 2025, 18, 522. [Google Scholar] [CrossRef]
Paniconi, C.; Putti, M. Physically based modeling in catchment hydrology at 50: Survey and outlook. Water Resour. Res. 2015, 51, 7090–7129. [Google Scholar] [CrossRef]
Mosaffa, H.; Sadeghi, M.; Mallakpour, I.; Jahromi, M.N.; Pourghasemi, H.R. Application of machine learning algorithms in hydrology. In Computers in Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2022; pp. 585–591. [Google Scholar]
Muñoz-Carpena, R.; Carmona-Cabrero, A.; Yu, Z.; Fox, G.; Batelaan, O. Convergence of mechanistic modeling and artificial intelligence in hydrologic science and engineering. PLoS Water 2023, 2, e0000059. [Google Scholar] [CrossRef]
Zanella, A.; Zubelzu, S.; Bennis, M. Sensor networks, data processing, and inference: The hydrology challenge. IEEE Access 2023, 11, 107823–107842. [Google Scholar] [CrossRef]
Baste, S.; Klotz, D.; Acuña Espinoza, E.; Bardossy, A.; Loritz, R. Unveiling the limits of deep learning models in hydrological extrapolation tasks. Hydrol. Earth Syst. Sci. 2025, 29, 5871–5891. [Google Scholar] [CrossRef]
Parisouj, P.; Mokari, E.; Mohebzadeh, H.; Goharnejad, H.; Jun, C.; Oh, J.; Bateni, S.M. Physics-informed data-driven model for predicting streamflow: A case study of the Voshmgir Basin, Iran. Appl. Sci. 2022, 12, 7464. [Google Scholar] [CrossRef]
Lu, D.; Konapala, G.; Painter, S.L.; Kao, S.C.; Gangrade, S. Streamflow simulation in data-scarce basins using Bayesian and physics-informed machine learning models. J. Hydrometeorol. 2021, 22, 1421–1438. [Google Scholar]
Zhong, L.; Lei, H.; Yang, J. Development of a distributed physics-informed deep learning hydrological model for data-scarce regions. Water Resour. Res. 2024, 60, e2023WR036333. [Google Scholar] [CrossRef]
Zhong, L.; Lei, H.; Gao, B. Developing a physics-informed deep learning model to simulate runoff response to climate change in alpine catchments. Water Resour. Res. 2023, 59, e2022WR034118. [Google Scholar] [CrossRef]
Zhao, Y.; Chadha, M.; Barthlow, D.; Yeates, E.; Mcknight, C.J.; Memarsadeghi, N.P.; Hu, Z. Physics-enhanced machine learning models for streamflow discharge forecasting. J. Hydroinform. 2024, 26, 2506–2537. [Google Scholar] [CrossRef]
Zhang, M.; Yao, T.; Gu, H.; Wang, W.; Pan, L.; Lu, B. A Hybrid Runoff Forecasting Framework Integrating Hydrological Physics and Data-Driven Models. Sustainability 2025, 17, 11120. [Google Scholar] [CrossRef]
Liu, B.; Tang, Q.; Zhao, G.; Gao, L.; Shen, C.; Pan, B. Physics-guided long short-term memory network for streamflow and flood simulations in the Lancang–Mekong river basin. Water 2022, 14, 1429. [Google Scholar] [CrossRef]
Xu, Q.; Shi, Y.; Bamber, J.L.; Tuo, Y.; Ludwig, R.; Zhu, X.X. Physics-aware machine learning revolutionizes scientific paradigm for process-based modeling in hydrology. Earth-Sci. Rev. 2025, 271, 105276. [Google Scholar] [CrossRef]
Green, W.H.; Ampt, G.A. Studies on Soil Physics. J. Agric. Sci. 1911, 4, 1–24. [Google Scholar] [CrossRef]
Soil Conservation Service (SCS). National Engineering Handbook, Section 4—Hydrology; U.S. Department of Agriculture: Washington, DC, USA, 1985.
Van Genuchten, M.T. A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Sci. Soc. Am. J. 1980, 44, 892–898. [Google Scholar] [CrossRef]
Mualem, Y. Hysteretical models for prediction of the hydraulic conductivity of unsaturated porous media. Water Resour. Res. 1976, 12, 1248–1254. [Google Scholar] [CrossRef]
Confederación Hidrográfica del Ebro (CHE). Sistema Automático de Información Hidrológica. 2025. Available online: https://www.saihebro.com/homepage/estado-cuenca-ebro (accessed on 18 May 2026).
Ministerio de Agricultura, Pesca y Alimentación (MAPA). Sistema de Información y Asesoramiento al Regante (SIAR). 2026. Available online: https://servicio.mapa.gob.es/siarweb/consultaDatos/inicio (accessed on 18 May 2026).
Servei Meteorològic de Catalunya (CAT). Servicio Meteorológico Català. Gobierno de Cataluña. 2025. Available online: https://es.meteocat.gencat.cat/?lang=es (accessed on 18 May 2026).
Euskalmet—Agencia Vasca de Meteorología (PV). 2025. Available online: https://www.euskalmet.euskadi.eus/el-tiempo/euskadi/ (accessed on 18 May 2026).
NASA National Snow and Ice Data Center. SMAP L4 Global 3-Hourly 9 km EASE-Grid Surface and Root Zone Soil Moisture Geophysical Data, Version 7; NASA NSIDC DAAC: Boulder, CO, USA, 2025. [CrossRef]
Almeida-Ñauñay, A.F.; Sanz, E.; Berlanga, A.; Patricio, M.Á.; Molina, J.M.; Zubelzu, S. Development of Open-Source Tools for Event-Based Hydrological Modelling Using GIS and Python. Water 2025, 17, 2160. [Google Scholar] [CrossRef]
Instituto Geográfico Nacional (IGN). Modelo Digital del Terreno 2ª Cobertura (2015–2021) con Paso de Malla de 2 Metros [Cartografía Digital]—1:25.000; Instituto Geográfico Nacional: Madrid, Spain, 2021.
Instituto Geográfico Nacional (IGN). Sistema de Ocupación del Suelo de España (SIOSE) [Cartografía Digital]—1:25.000; Instituto Geográfico Nacional: Madrid, Spain, 2014.
Poggio, L.; De Sousa, L.M.; Batjes, N.H.; Heuvelink, G.B.; Kempen, B.; Ribeiro, E.; Rossiter, D. SoilGrids 2.0: Producing soil information for the globe with quantified spatial uncertainty. Soil 2021, 7, 217–240. [Google Scholar] [CrossRef]
Carsel, R.F.; Parrish, R.S. Developing joint probability distributions of soil water retention characteristics. Water Resour. Res. 1988, 24, 755–769. [Google Scholar] [CrossRef]
Neuman, S.P. Wetting front pressure head in the infiltration model of Green and Ampt. Water Resour. Res. 1976, 12, 564–566. [Google Scholar] [CrossRef]
Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Gupta, H.V. What role does hydrological science play in the age of machine learning? Water Resour. Res. 2021, 57, e2020WR028091. [Google Scholar] [CrossRef]
Shen, C.; Appling, A.P.; Gentine, P.; Bandai, T.; Gupta, H.; Tartakovsky, A.; Lawson, K. Differentiable modelling to unify machine learning and physical models for geosciences. Nat. Rev. Earth Environ. 2023, 4, 552–567. [Google Scholar] [CrossRef]
Nazari, L.F.; Camponogara, E.; Seman, L.O. Physics-informed neural networks for modeling water flows in a river channel. IEEE Trans. Artif. Intell. 2022, 5, 1001–1015. [Google Scholar] [CrossRef]
Liang, J.; Li, W.; Bradford, S.A.; Šimůnek, J. Physics-informed data-driven models to predict surface runoff water quantity and quality in agricultural fields. Water 2019, 11, 200. [Google Scholar] [CrossRef]
Rodriguez-Iturbe, I.; Rinaldo, A. Fractal River Basins: Chance and Self-Organization; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
Merz, R.; Blöschl, G. A regional analysis of event runoff coefficients with respect to climate and catchment characteristics in Austria. Water Resour. Res. 2009, 45, W01405. [Google Scholar] [CrossRef]
Zehe, E.; Sivapalan, M. Threshold behaviour in hydrological systems as (human) geo-ecosystems: Manifestations, controls, implications. Hydrol. Earth Syst. Sci. 2009, 13, 1273–1297. [Google Scholar] [CrossRef]
Kirchner, J.W. A double paradox in catchment hydrology and geochemistry. Hydrol. Process. 2003, 17, 871–874. [Google Scholar] [CrossRef]
Kratzert, F.; Klotz, D.; Herrnegger, M.; Sampson, A.K.; Hochreiter, S.; Nearing, G.S. Toward improved predictions in ungauged basins: Exploiting the power of machine learning. Water Resour. Res. 2019, 55, 11344–11354. [Google Scholar] [CrossRef]
Bárdossy, A.; Anwar, F. Why do our rainfall–runoff models keep underestimating the peak flows? Hydrol. Earth Syst. Sci. 2023, 27, 1987–2000. [Google Scholar] [CrossRef]
Kirchner, J.W. Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology. Water Resour. Res. 2006, 42, W03S04. [Google Scholar] [CrossRef]
Blöschl, G. (Ed.) Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]

Figure 1. Location of the analyzed catchments and streamflow gauging stations (triangles).

Figure 2. Relationship between correlation coefficient and standard deviation ratio (labels indicate the algorithm associated with each point, while colors represent performance categories related to algorithms: blue = good -RF-, green = moderate -XGBoost-, red = low performance -SVM-). Grey dashed lines represent isolines of Pearson correlation coefficient between predicted and observed values. Blue arcs centred on the origin represent isolines of the standard deviation ratio (σ_predicted/σ_observed).

Figure 3. Scatterplots of predicted vs. observed water level variables (ymin: (a), yav: (c), ymax: (b)). Dots represent individual predicted–observed pairs, and the dashed line indicates the 1:1 identity line.

Figure 4. Residuals vs. predicted values scatterplot (points) with a horizontal zero reference line indicating no error (a), histogram of the residuals distribution (b), normal Q-Q plot showing sample quantiles (points) against theoretical normal quantiles, with the straight line representing the line of perfect agreement (c), and residuals over time (points) with a horizontal zero reference line indicating no error (d) for ymin.

Figure 5. Residuals vs. predicted values scatterplot (points) with a horizontal zero reference line indicating no error (a), histogram of the residuals distribution (b), normal Q-Q plot showing sample quantiles (points) against theoretical normal quantiles, with the straight line representing the line of perfect agreement (c), and residuals over time (points) with a horizontal zero reference line indicating no error (d) for yav.

Figure 6. Residuals vs. predicted values scatterplot (points) with a horizontal zero reference line indicating no error (a), histogram of the residuals distribution (b), normal Q-Q plot showing sample quantiles (points) against theoretical normal quantiles, with the straight line representing the line of perfect agreement (c), and residuals over time (points) with a horizontal zero reference line indicating no error (d) for ymax.

Figure 7. Examples from Anduña (a), Bailin (b) and Deza (c) catchments of model predictions vs. records for ymin.

Figure 8. Examples from Anduña (a), Bailin (b) and Deza (c) catchments of model predictions vs. records for yav.

Figure 9. Examples from Anduña (a), Bailin (b) and Deza (c) catchments of model predictions vs. records for ymax.

Figure 10. Feature importance for ymin (a), yav (b) and ymax (c).

Table 1. River names and locations of the streamflow gauging stations used to define the contributing catchments considered as case studies in this work.

River	Water Level Gauging Point Location
Izalzu	Anduña
Bailin	Sabiñánigo
Deza	Embid de Ariza
Flamisell	Cabdella
Garona	Bossost
Isuela	Trasobares
Larraun	Iribas
Nela	Villarcayo
Oroncillo	Orón
Rudron	Valdelateja
Sangüesa	Onsella
Subialde	Larrinoa
Tiron	San Miguel Pedroso
Trueba	Medina de Pomar
Ubagua	Riezu
Urrobi	Espinal
Vallfarrera	Allins
Yanguas	Yanguas
Zatoya	Ochagavia
Zidacos	Garinoáin

Table 2. Summary of the collected data, variable names, and derived synthetic variables used in this study.

Type	Symbol	Variable
Original	tmax	Maximum temperature
Original	tav	Average temperature
Original	tmin	Minimum temperature
Original	HRav	Average relative humidity
Original	HRmax	Maximum Relative Humidity
Original	HRmin	Minimum Relative Humidity
Original	wv	Wind speed
Original	wvmax	Maximum wind speed
Original	SR	Solar radiation
Original	ET0	Potential evapotranspiration
Original	P	Precipitation
Original	ymin	Minimum measured water level
Original	yav	Average measured water level
Original	ymax	Máximum measured water level
Original	qmin	Minimum measured outflow
Original	qav	Average measured outflow
Original	qmax	Maximum measured outflow
Original	θ0r	Initial soil water content at 00:00 h rootzone
Original	θ0s	Initial soil water content at 00:00 h surface
Synthetic	Dif_t	Daily difference between maximum and minimum temperature
Synthetic	Dif_HR	Daily difference between maximum and minimum relative humidity
Synthetic	Dif_q	Daily difference between maximum and minimum flow values
Synthetic	Dif_y	Daily difference between maximum and minimum level values
Synthetic	Dif_θr	Difference between maximum and minimum soil moisture rootzone
Synthetic	Dif_θs	Difference between maximum and minimum soil moisture surface
Synthetic	Av_θr	Daily average soil moisture, rootzone,
Synthetic	Av_θs	Daily average soil moisture, surface

Table 3. Summary of the physical and territorial information used as model inputs and their corresponding data sources.

Symbol	Variable
CN_we	Catchment’s Curve Number Average (SIOSE vector from IGN, 2021 [25]).
DT_CN	Standard deviation of Curve Number values (SIOSE vector from IGN [26]).
area_catch	Basin Area (IGN 2 m DEM from IGN [25]).
chan_length	Length of the main channel (IGN 2 m DEM from IGN [25]).
z_min	Minimum height of the basin (IGN 2 m DEM from IGN [25]).
z_max	Maximum height of the basin (IGN 2 m DEM from IGN [25]).
n	Manning’s average catchment roughness (SIOSE vector from IGN [25]).
DT_n	Standard deviation Manning’s catchment roughness (SIOSE from IGN [25]).
w	Average channel cross section area (IGN 2 m DEM from IGN [25]
DT_ks	Saturated hydraulic conductivity, SD of pixel’s values from [27]
mean_ks	Saturated hydraulic conductivity, average of pixel’s values from [27]
mean_thetas	Saturated soil moisture (Carsel and Parrish, 1988 [28]). Average of pixel’s values from [27]
DT_thetas	Saturated soil moisture (Carsel and Parrish, 1988 [28]). SD of pixel’s values from [27]
mean_thetar	Residual soil moisture (Carsel and Parrish, 1988 [28]). Average of pixel’s values from [27]
DT_thetar	Residual soil moisture (Carsel and Parrish, 1988 [28]). SD of pixel’s values from [27]
mean_thau	Suction head wetting front (Neuman, 1976 [29]). Average of pixel’s values from [27]
DT_thau	Suction head wetting front (Neuman, 1976 [29]). SD of pixel’s values from [27]
mean_swcfc	Soil moisture at field capacity. Average of pixel’s values from [27]
DT_swcfc	Soil moisture at field capacity. SD of pixel’s values from [27]
mean_swcpwp	Soil moisture at permanent wilting point. Average of pixel’s values from [27]
DT_swcpwp	Soil moisture at permanent wilting point. SD of pixel’s values from [27]

Table 4. Models’ configuration (input and outputs) replicating the casual relationships presented in Equations (6), (7), (10a) and (10b).

Approach	Outputs ¹	Inputs	Acronym
Equation (6)	ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q	P, θ0r, θ0s, DT_ks, mean_ks, mean_thetas, DT_thetas, mean_thetar, DT_thetar, mean_thau, DT_thau, n, DT_n, w, area_catch, chan_length, z_min, z_max	Exp (6a)
Equation (6)	ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q	P, Av_θr, Av_θs, DT_ks, mean_ks, mean_thetas, DT_thetas, mean_thetar, DT_thetar, mean_thau, DT_thau, n, DT_n, w, area_catch, chan_length, z_min, z_max	Exp (6b)
Equation (7)	ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q	P, θ0r, θ0s, CN_we, CN_DT, n, N_DT, w, area_catch, chan_length, z_min, z_max	Exp (7a)
Equation (7)	ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q	P, Av_θr, Av_θs, CN_we, CN_DT, n, N_DT, w, area_catch, chan_length, z_min, z_max	Exp (7b)
Equation (10a)	ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q	P, Dif_θr, Dif_θs, tav, tmax, tmin, HRav, HRmax, HRmin, wv, wvmax, SR, mean_swcfc, DT_swcfc, mean_swcpwp, DT_swcpwp, n, N_DT, w, area_catch, chan_length, z_min, z_max	Exp (10a)
Equation (10b)	ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q	P, Dif_θr, Dif_θs, ET0, mean_swcfc, DT_swcfc, mean_swcpwp, DT_swcpwp, n, N_DT, w, area_catch, chan_length, z_min, z_max	Exp (10b)

¹ Each of the outputs included in this column was modeled separately using the inputs listed in the adjacent cell.

Table 5. Model configurations for the development of autoregressive models (inputs and outputs).

Approach	Conceptual Relationship Between Inputs and Output
Pure auto-regressive All catchments, catchments clustered following GA, catchments clustered following CN	ymin_t = f(ymin_t₋₁, ymin_t₋₂, …)
	yav_t = f(yav_t₋₁, yav_t₋₂, …)
	ymax_t = f(ymax_t₋₁, ymax_t₋₂, …)
	qmin_t = f(qmin_t₋₁, qmin_t₋₂, …)
	qav_t = f(qav_t₋₁, qav_t₋₂, …)
	qmax_t = f(qmax_t₋₁, qmax_t₋₂, …)

Table 6. Hyperparameter tuning information of the algorithm with higher prediction capacity.

Algorithm	Hyperparameters Tuning	Configurations Tested
RF	Number of estimators: 100, 200, 300. Maximum tree depth: 10, unrestricted depth. Minimum number of samples for node splitting; 2, 10. Minimum number of samples at leaf nodes: 1, 4. Number of features considered at each split: sqrt (all features), all features. Bootstrap aggregation: yes, no.	96
XGBoost	Number of estimators: 100, 200, 300. Maximum tree depth: 3, 6, 10. Learning rate: 0.01, 0.1. Subsampling ratios: 0.8, 1.0. Proportion of features sampled for each tree: 0.8, 1.0.	72
SVMR	Regularization parameter C: 1.0, 10.0, 100.0. Epsilon:0.01, 0.1, 0.2. Gamma: “scale”, “auto”, 0.01, 0.1.	36

Table 7. MAE values of the best-performing models (selected based on minimum MAE) by theoretical approach and output.

Acronym	ymin	yav	ymax	qmin	qav	qmax	Dif_y	Dif_q
Exp (6a)	0.062	0.071	0.085	0.909	1.208	1.654	0.035	0.881
Exp (7a)	0.063	0.071	0.085	0.909	1.209	1.655	0.055	0.881
Exp (10a)	0.071	0.08	0.094	0.889	1.168	1.614	0.036	0.767
Exp (10b)	0.081	0.09	0.102	1.104	1.406	1.865	0.036	0.881
Auto-reg (clusters CN)	0.063	0.079	0.091	1.250	1.711	2.424	0.044	1.181
Auto-reg (clusters GA)	0.063	0.076	0.092	1.252	1.712	2.424	0.044	1.186
Auto-reg (all)	0.079	0.151	0.169	1.554	1.872	2.470	0.049	1.077

Table 8. Performance indicators of the best models selected following the minimum MAE value (for the validation dataset).

Output	Acronym	Algorithm	RMSE	MAE	NSE	KGE	PBIAS	MAPE
ymin	Exp6a	RF	0.10	0.062	0.75	0.82	−0.17	41.62
yav	Exp6a	RF	0.12	0.071	0.72	0.80	−0.24	21.73
ymax	Exp6a	RF	0.15	0.085	0.67	0.75	−0.30	31.07
qmin	Exp6a	RF	2.33	0.909	0.40	0.52	−1.27	145.2
qav	Exp6a	RF	3.18	1.20	0.40	0.51	−1.23	216.5
qmax	Exp6a	RF	4.49	1.65	0.42	0.54	−1.4	180.9
Dif_y	Exp6a	XGB	0.082	0.035	0.41	0.54	−0.78	108.1
Dif_q	Exp7a	SVMR	3.54	0.88	0.12	−0.03	56.18	1327.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Galán, V.; Navas, R.; Zubelzu, S. Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks. Sustainability 2026, 18, 6381. https://doi.org/10.3390/su18136381

AMA Style

Galán V, Navas R, Zubelzu S. Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks. Sustainability. 2026; 18(13):6381. https://doi.org/10.3390/su18136381

Chicago/Turabian Style

Galán, Victor, Rafael Navas, and Sergio Zubelzu. 2026. "Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks" Sustainability 18, no. 13: 6381. https://doi.org/10.3390/su18136381

APA Style

Galán, V., Navas, R., & Zubelzu, S. (2026). Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks. Sustainability, 18(13), 6381. https://doi.org/10.3390/su18136381

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks

Abstract

1. Introduction

2. Materials and Methods

2.1. General Framework

2.2. Case Studies and Data Availability

2.3. Model Configuration

2.4. Data-Driven Algorithms

2.5. Performance Metrics

3. Results

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI