Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks
Abstract
1. Introduction
2. Materials and Methods
2.1. General Framework
2.2. Case Studies and Data Availability
- (a)
- Small watersheds that can be readily parameterized from a physical perspective, facilitating the analysis of isolated hydrological processes such as rainfall, infiltration, and runoff.
- (b)
- Absence of internal storage structures, water abstractions, or inter-basin water transfers, thereby ensuring compliance with the law of mass conservation.
- (c)
- Presence of a single, clearly identifiable main channel.
- (d)
- Availability of sufficiently long and high-temporal-resolution historical records to enable the development of representative data-driven models.
2.3. Model Configuration
2.4. Data-Driven Algorithms
2.5. Performance Metrics
3. Results
- (a)
- Channel length reduces the variability of the resulting data by 21.6%, 12.0%, and 10.3% after being used to split the sample.
- (b)
- Highest catchment elevation reduces the variability of the resulting data by 14.8%, 11.2%, and 9.8% after being used to split the sample.
- (c)
- Average surface soil moisture reduces the variability of the resulting data by 18.7%, 17.0%, and 19.2% after being used to split the sample.
- (d)
- Average root-zone soil moisture reduces the variability of the resulting data by 12.2%, 16.1%, and 17.0% after being used to split the sample.
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Adombi, A.V.D.P. Scientific machine learning in hydrology: A unified perspective. Earth Sci. Inform. 2025, 18, 522. [Google Scholar] [CrossRef]
- Paniconi, C.; Putti, M. Physically based modeling in catchment hydrology at 50: Survey and outlook. Water Resour. Res. 2015, 51, 7090–7129. [Google Scholar] [CrossRef]
- Mosaffa, H.; Sadeghi, M.; Mallakpour, I.; Jahromi, M.N.; Pourghasemi, H.R. Application of machine learning algorithms in hydrology. In Computers in Earth and Environmental Sciences; Elsevier: Amsterdam, The Netherlands, 2022; pp. 585–591. [Google Scholar]
- Muñoz-Carpena, R.; Carmona-Cabrero, A.; Yu, Z.; Fox, G.; Batelaan, O. Convergence of mechanistic modeling and artificial intelligence in hydrologic science and engineering. PLoS Water 2023, 2, e0000059. [Google Scholar] [CrossRef]
- Zanella, A.; Zubelzu, S.; Bennis, M. Sensor networks, data processing, and inference: The hydrology challenge. IEEE Access 2023, 11, 107823–107842. [Google Scholar] [CrossRef]
- Baste, S.; Klotz, D.; Acuña Espinoza, E.; Bardossy, A.; Loritz, R. Unveiling the limits of deep learning models in hydrological extrapolation tasks. Hydrol. Earth Syst. Sci. 2025, 29, 5871–5891. [Google Scholar] [CrossRef]
- Parisouj, P.; Mokari, E.; Mohebzadeh, H.; Goharnejad, H.; Jun, C.; Oh, J.; Bateni, S.M. Physics-informed data-driven model for predicting streamflow: A case study of the Voshmgir Basin, Iran. Appl. Sci. 2022, 12, 7464. [Google Scholar] [CrossRef]
- Lu, D.; Konapala, G.; Painter, S.L.; Kao, S.C.; Gangrade, S. Streamflow simulation in data-scarce basins using Bayesian and physics-informed machine learning models. J. Hydrometeorol. 2021, 22, 1421–1438. [Google Scholar]
- Zhong, L.; Lei, H.; Yang, J. Development of a distributed physics-informed deep learning hydrological model for data-scarce regions. Water Resour. Res. 2024, 60, e2023WR036333. [Google Scholar] [CrossRef]
- Zhong, L.; Lei, H.; Gao, B. Developing a physics-informed deep learning model to simulate runoff response to climate change in alpine catchments. Water Resour. Res. 2023, 59, e2022WR034118. [Google Scholar] [CrossRef]
- Zhao, Y.; Chadha, M.; Barthlow, D.; Yeates, E.; Mcknight, C.J.; Memarsadeghi, N.P.; Hu, Z. Physics-enhanced machine learning models for streamflow discharge forecasting. J. Hydroinform. 2024, 26, 2506–2537. [Google Scholar] [CrossRef]
- Zhang, M.; Yao, T.; Gu, H.; Wang, W.; Pan, L.; Lu, B. A Hybrid Runoff Forecasting Framework Integrating Hydrological Physics and Data-Driven Models. Sustainability 2025, 17, 11120. [Google Scholar] [CrossRef]
- Liu, B.; Tang, Q.; Zhao, G.; Gao, L.; Shen, C.; Pan, B. Physics-guided long short-term memory network for streamflow and flood simulations in the Lancang–Mekong river basin. Water 2022, 14, 1429. [Google Scholar] [CrossRef]
- Xu, Q.; Shi, Y.; Bamber, J.L.; Tuo, Y.; Ludwig, R.; Zhu, X.X. Physics-aware machine learning revolutionizes scientific paradigm for process-based modeling in hydrology. Earth-Sci. Rev. 2025, 271, 105276. [Google Scholar] [CrossRef]
- Green, W.H.; Ampt, G.A. Studies on Soil Physics. J. Agric. Sci. 1911, 4, 1–24. [Google Scholar] [CrossRef]
- Soil Conservation Service (SCS). National Engineering Handbook, Section 4—Hydrology; U.S. Department of Agriculture: Washington, DC, USA, 1985.
- Van Genuchten, M.T. A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Sci. Soc. Am. J. 1980, 44, 892–898. [Google Scholar] [CrossRef]
- Mualem, Y. Hysteretical models for prediction of the hydraulic conductivity of unsaturated porous media. Water Resour. Res. 1976, 12, 1248–1254. [Google Scholar] [CrossRef]
- Confederación Hidrográfica del Ebro (CHE). Sistema Automático de Información Hidrológica. 2025. Available online: https://www.saihebro.com/homepage/estado-cuenca-ebro (accessed on 18 May 2026).
- Ministerio de Agricultura, Pesca y Alimentación (MAPA). Sistema de Información y Asesoramiento al Regante (SIAR). 2026. Available online: https://servicio.mapa.gob.es/siarweb/consultaDatos/inicio (accessed on 18 May 2026).
- Servei Meteorològic de Catalunya (CAT). Servicio Meteorológico Català. Gobierno de Cataluña. 2025. Available online: https://es.meteocat.gencat.cat/?lang=es (accessed on 18 May 2026).
- Euskalmet—Agencia Vasca de Meteorología (PV). 2025. Available online: https://www.euskalmet.euskadi.eus/el-tiempo/euskadi/ (accessed on 18 May 2026).
- NASA National Snow and Ice Data Center. SMAP L4 Global 3-Hourly 9 km EASE-Grid Surface and Root Zone Soil Moisture Geophysical Data, Version 7; NASA NSIDC DAAC: Boulder, CO, USA, 2025. [CrossRef]
- Almeida-Ñauñay, A.F.; Sanz, E.; Berlanga, A.; Patricio, M.Á.; Molina, J.M.; Zubelzu, S. Development of Open-Source Tools for Event-Based Hydrological Modelling Using GIS and Python. Water 2025, 17, 2160. [Google Scholar] [CrossRef]
- Instituto Geográfico Nacional (IGN). Modelo Digital del Terreno 2ª Cobertura (2015–2021) con Paso de Malla de 2 Metros [Cartografía Digital]—1:25.000; Instituto Geográfico Nacional: Madrid, Spain, 2021.
- Instituto Geográfico Nacional (IGN). Sistema de Ocupación del Suelo de España (SIOSE) [Cartografía Digital]—1:25.000; Instituto Geográfico Nacional: Madrid, Spain, 2014.
- Poggio, L.; De Sousa, L.M.; Batjes, N.H.; Heuvelink, G.B.; Kempen, B.; Ribeiro, E.; Rossiter, D. SoilGrids 2.0: Producing soil information for the globe with quantified spatial uncertainty. Soil 2021, 7, 217–240. [Google Scholar] [CrossRef]
- Carsel, R.F.; Parrish, R.S. Developing joint probability distributions of soil water retention characteristics. Water Resour. Res. 1988, 24, 755–769. [Google Scholar] [CrossRef]
- Neuman, S.P. Wetting front pressure head in the infiltration model of Green and Ampt. Water Resour. Res. 1976, 12, 564–566. [Google Scholar] [CrossRef]
- Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Gupta, H.V. What role does hydrological science play in the age of machine learning? Water Resour. Res. 2021, 57, e2020WR028091. [Google Scholar] [CrossRef]
- Shen, C.; Appling, A.P.; Gentine, P.; Bandai, T.; Gupta, H.; Tartakovsky, A.; Lawson, K. Differentiable modelling to unify machine learning and physical models for geosciences. Nat. Rev. Earth Environ. 2023, 4, 552–567. [Google Scholar] [CrossRef]
- Nazari, L.F.; Camponogara, E.; Seman, L.O. Physics-informed neural networks for modeling water flows in a river channel. IEEE Trans. Artif. Intell. 2022, 5, 1001–1015. [Google Scholar] [CrossRef]
- Liang, J.; Li, W.; Bradford, S.A.; Šimůnek, J. Physics-informed data-driven models to predict surface runoff water quantity and quality in agricultural fields. Water 2019, 11, 200. [Google Scholar] [CrossRef]
- Rodriguez-Iturbe, I.; Rinaldo, A. Fractal River Basins: Chance and Self-Organization; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Merz, R.; Blöschl, G. A regional analysis of event runoff coefficients with respect to climate and catchment characteristics in Austria. Water Resour. Res. 2009, 45, W01405. [Google Scholar] [CrossRef]
- Zehe, E.; Sivapalan, M. Threshold behaviour in hydrological systems as (human) geo-ecosystems: Manifestations, controls, implications. Hydrol. Earth Syst. Sci. 2009, 13, 1273–1297. [Google Scholar] [CrossRef]
- Kirchner, J.W. A double paradox in catchment hydrology and geochemistry. Hydrol. Process. 2003, 17, 871–874. [Google Scholar] [CrossRef]
- Kratzert, F.; Klotz, D.; Herrnegger, M.; Sampson, A.K.; Hochreiter, S.; Nearing, G.S. Toward improved predictions in ungauged basins: Exploiting the power of machine learning. Water Resour. Res. 2019, 55, 11344–11354. [Google Scholar] [CrossRef]
- Bárdossy, A.; Anwar, F. Why do our rainfall–runoff models keep underestimating the peak flows? Hydrol. Earth Syst. Sci. 2023, 27, 1987–2000. [Google Scholar] [CrossRef]
- Kirchner, J.W. Getting the right answers for the right reasons: Linking measurements, analyses, and models to advance the science of hydrology. Water Resour. Res. 2006, 42, W03S04. [Google Scholar] [CrossRef]
- Blöschl, G. (Ed.) Runoff Prediction in Ungauged Basins: Synthesis Across Processes, Places and Scales; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]











| River | Water Level Gauging Point Location |
|---|---|
| Izalzu | Anduña |
| Bailin | Sabiñánigo |
| Deza | Embid de Ariza |
| Flamisell | Cabdella |
| Garona | Bossost |
| Isuela | Trasobares |
| Larraun | Iribas |
| Nela | Villarcayo |
| Oroncillo | Orón |
| Rudron | Valdelateja |
| Sangüesa | Onsella |
| Subialde | Larrinoa |
| Tiron | San Miguel Pedroso |
| Trueba | Medina de Pomar |
| Ubagua | Riezu |
| Urrobi | Espinal |
| Vallfarrera | Allins |
| Yanguas | Yanguas |
| Zatoya | Ochagavia |
| Zidacos | Garinoáin |
| Type | Symbol | Variable |
|---|---|---|
| Original | tmax | Maximum temperature |
| Original | tav | Average temperature |
| Original | tmin | Minimum temperature |
| Original | HRav | Average relative humidity |
| Original | HRmax | Maximum Relative Humidity |
| Original | HRmin | Minimum Relative Humidity |
| Original | wv | Wind speed |
| Original | wvmax | Maximum wind speed |
| Original | SR | Solar radiation |
| Original | ET0 | Potential evapotranspiration |
| Original | P | Precipitation |
| Original | ymin | Minimum measured water level |
| Original | yav | Average measured water level |
| Original | ymax | Máximum measured water level |
| Original | qmin | Minimum measured outflow |
| Original | qav | Average measured outflow |
| Original | qmax | Maximum measured outflow |
| Original | θ0r | Initial soil water content at 00:00 h rootzone |
| Original | θ0s | Initial soil water content at 00:00 h surface |
| Synthetic | Dif_t | Daily difference between maximum and minimum temperature |
| Synthetic | Dif_HR | Daily difference between maximum and minimum relative humidity |
| Synthetic | Dif_q | Daily difference between maximum and minimum flow values |
| Synthetic | Dif_y | Daily difference between maximum and minimum level values |
| Synthetic | Dif_θr | Difference between maximum and minimum soil moisture rootzone |
| Synthetic | Dif_θs | Difference between maximum and minimum soil moisture surface |
| Synthetic | Av_θr | Daily average soil moisture, rootzone, |
| Synthetic | Av_θs | Daily average soil moisture, surface |
| Symbol | Variable |
|---|---|
| CN_we | Catchment’s Curve Number Average (SIOSE vector from IGN, 2021 [25]). |
| DT_CN | Standard deviation of Curve Number values (SIOSE vector from IGN [26]). |
| area_catch | Basin Area (IGN 2 m DEM from IGN [25]). |
| chan_length | Length of the main channel (IGN 2 m DEM from IGN [25]). |
| z_min | Minimum height of the basin (IGN 2 m DEM from IGN [25]). |
| z_max | Maximum height of the basin (IGN 2 m DEM from IGN [25]). |
| n | Manning’s average catchment roughness (SIOSE vector from IGN [25]). |
| DT_n | Standard deviation Manning’s catchment roughness (SIOSE from IGN [25]). |
| w | Average channel cross section area (IGN 2 m DEM from IGN [25] |
| DT_ks | Saturated hydraulic conductivity, SD of pixel’s values from [27] |
| mean_ks | Saturated hydraulic conductivity, average of pixel’s values from [27] |
| mean_thetas | Saturated soil moisture (Carsel and Parrish, 1988 [28]). Average of pixel’s values from [27] |
| DT_thetas | Saturated soil moisture (Carsel and Parrish, 1988 [28]). SD of pixel’s values from [27] |
| mean_thetar | Residual soil moisture (Carsel and Parrish, 1988 [28]). Average of pixel’s values from [27] |
| DT_thetar | Residual soil moisture (Carsel and Parrish, 1988 [28]). SD of pixel’s values from [27] |
| mean_thau | Suction head wetting front (Neuman, 1976 [29]). Average of pixel’s values from [27] |
| DT_thau | Suction head wetting front (Neuman, 1976 [29]). SD of pixel’s values from [27] |
| mean_swcfc | Soil moisture at field capacity. Average of pixel’s values from [27] |
| DT_swcfc | Soil moisture at field capacity. SD of pixel’s values from [27] |
| mean_swcpwp | Soil moisture at permanent wilting point. Average of pixel’s values from [27] |
| DT_swcpwp | Soil moisture at permanent wilting point. SD of pixel’s values from [27] |
| Approach | Outputs 1 | Inputs | Acronym |
|---|---|---|---|
| Equation (6) | ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q | P, θ0r, θ0s, DT_ks, mean_ks, mean_thetas, DT_thetas, mean_thetar, DT_thetar, mean_thau, DT_thau, n, DT_n, w, area_catch, chan_length, z_min, z_max | Exp (6a) |
| ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q | P, Av_θr, Av_θs, DT_ks, mean_ks, mean_thetas, DT_thetas, mean_thetar, DT_thetar, mean_thau, DT_thau, n, DT_n, w, area_catch, chan_length, z_min, z_max | Exp (6b) | |
| Equation (7) | ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q | P, θ0r, θ0s, CN_we, CN_DT, n, N_DT, w, area_catch, chan_length, z_min, z_max | Exp (7a) |
| ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q | P, Av_θr, Av_θs, CN_we, CN_DT, n, N_DT, w, area_catch, chan_length, z_min, z_max | Exp (7b) | |
| Equation (10a) | ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q | P, Dif_θr, Dif_θs, tav, tmax, tmin, HRav, HRmax, HRmin, wv, wvmax, SR, mean_swcfc, DT_swcfc, mean_swcpwp, DT_swcpwp, n, N_DT, w, area_catch, chan_length, z_min, z_max | Exp (10a) |
| Equation (10b) | ymin, yav, ymax, qmin, qav, qmax, Dif_y, Dif_q | P, Dif_θr, Dif_θs, ET0, mean_swcfc, DT_swcfc, mean_swcpwp, DT_swcpwp, n, N_DT, w, area_catch, chan_length, z_min, z_max | Exp (10b) |
| Approach | Conceptual Relationship Between Inputs and Output |
|---|---|
| Pure auto-regressive All catchments, catchments clustered following GA, catchments clustered following CN | ymint = f(ymint−1, ymint−2, …) |
| yavt = f(yavt−1, yavt−2, …) | |
| ymaxt = f(ymaxt−1, ymaxt−2, …) | |
| qmint = f(qmint−1, qmint−2, …) | |
| qavt = f(qavt−1, qavt−2, …) | |
| qmaxt = f(qmaxt−1, qmaxt−2, …) |
| Algorithm | Hyperparameters Tuning | Configurations Tested |
|---|---|---|
| RF | Number of estimators: 100, 200, 300. Maximum tree depth: 10, unrestricted depth. Minimum number of samples for node splitting; 2, 10. Minimum number of samples at leaf nodes: 1, 4. Number of features considered at each split: sqrt (all features), all features. Bootstrap aggregation: yes, no. | 96 |
| XGBoost | Number of estimators: 100, 200, 300. Maximum tree depth: 3, 6, 10. Learning rate: 0.01, 0.1. Subsampling ratios: 0.8, 1.0. Proportion of features sampled for each tree: 0.8, 1.0. | 72 |
| SVMR | Regularization parameter C: 1.0, 10.0, 100.0. Epsilon:0.01, 0.1, 0.2. Gamma: “scale”, “auto”, 0.01, 0.1. | 36 |
| Acronym | ymin | yav | ymax | qmin | qav | qmax | Dif_y | Dif_q |
|---|---|---|---|---|---|---|---|---|
| Exp (6a) | 0.062 | 0.071 | 0.085 | 0.909 | 1.208 | 1.654 | 0.035 | 0.881 |
| Exp (7a) | 0.063 | 0.071 | 0.085 | 0.909 | 1.209 | 1.655 | 0.055 | 0.881 |
| Exp (10a) | 0.071 | 0.08 | 0.094 | 0.889 | 1.168 | 1.614 | 0.036 | 0.767 |
| Exp (10b) | 0.081 | 0.09 | 0.102 | 1.104 | 1.406 | 1.865 | 0.036 | 0.881 |
| Auto-reg (clusters CN) | 0.063 | 0.079 | 0.091 | 1.250 | 1.711 | 2.424 | 0.044 | 1.181 |
| Auto-reg (clusters GA) | 0.063 | 0.076 | 0.092 | 1.252 | 1.712 | 2.424 | 0.044 | 1.186 |
| Auto-reg (all) | 0.079 | 0.151 | 0.169 | 1.554 | 1.872 | 2.470 | 0.049 | 1.077 |
| Output | Acronym | Algorithm | RMSE | MAE | NSE | KGE | PBIAS | MAPE |
|---|---|---|---|---|---|---|---|---|
| ymin | Exp6a | RF | 0.10 | 0.062 | 0.75 | 0.82 | −0.17 | 41.62 |
| yav | Exp6a | RF | 0.12 | 0.071 | 0.72 | 0.80 | −0.24 | 21.73 |
| ymax | Exp6a | RF | 0.15 | 0.085 | 0.67 | 0.75 | −0.30 | 31.07 |
| qmin | Exp6a | RF | 2.33 | 0.909 | 0.40 | 0.52 | −1.27 | 145.2 |
| qav | Exp6a | RF | 3.18 | 1.20 | 0.40 | 0.51 | −1.23 | 216.5 |
| qmax | Exp6a | RF | 4.49 | 1.65 | 0.42 | 0.54 | −1.4 | 180.9 |
| Dif_y | Exp6a | XGB | 0.082 | 0.035 | 0.41 | 0.54 | −0.78 | 108.1 |
| Dif_q | Exp7a | SVMR | 3.54 | 0.88 | 0.12 | −0.03 | 56.18 | 1327.8 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Galán, V.; Navas, R.; Zubelzu, S. Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks. Sustainability 2026, 18, 6381. https://doi.org/10.3390/su18136381
Galán V, Navas R, Zubelzu S. Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks. Sustainability. 2026; 18(13):6381. https://doi.org/10.3390/su18136381
Chicago/Turabian StyleGalán, Victor, Rafael Navas, and Sergio Zubelzu. 2026. "Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks" Sustainability 18, no. 13: 6381. https://doi.org/10.3390/su18136381
APA StyleGalán, V., Navas, R., & Zubelzu, S. (2026). Physics-Informed Data-Driven Models for Streamflow Prediction in Small Catchments: Combining Hydrological Causality and Machine Learning Frameworks. Sustainability, 18(13), 6381. https://doi.org/10.3390/su18136381

