Next Article in Journal
Operation of a Zero-Discharge Evapotranspiration Tank for Blackwater Disposal in a Rural Quilombola Household, Brazil
Previous Article in Journal
Multi-Layer and Profile Soil Moisture Estimation and Uncertainty Evaluation Based on Multi-Frequency (Ka-, X-, C-, S-, and L-Band) and Quad-Polarization Airborne SAR Data from Synchronous Observation Experiment in Liao River Basin, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata

by
Alexandr Neftissov
1,
Andrii Biloshchytskyi
2,3,
Ilyas Kazambayev
1,
Serhii Dolhopolov
3 and
Tetyana Honcharenko
3,*
1
Research and Innovation Center “Industry 4.0”, Astana IT University, Astana 010000, Kazakhstan
2
University Administration, Astana IT University, Astana 010000, Kazakhstan
3
Department of Information Technology, Kyiv National University of Construction and Architecture, 03680 Kyiv, Ukraine
*
Author to whom correspondence should be addressed.
Water 2025, 17(14), 2097; https://doi.org/10.3390/w17142097 (registering DOI)
Submission received: 26 May 2025 / Revised: 7 July 2025 / Accepted: 10 July 2025 / Published: 14 July 2025

Abstract

Accurate estimation of long-term average (LTA) discharge is fundamental for water resource assessment, infrastructure planning, and hydrological modeling, yet it remains a significant challenge, particularly in data-scarce or ungauged basins. This study introduces an advanced machine learning framework to estimate long-term average discharge using globally available hydrological station metadata from the Global Runoff Data Centre (GRDC). The methodology involved comprehensive data preprocessing, extensive feature engineering, log-transformation of the target variable, and the development of multiple predictive models, including a custom deep neural network with specialized pathways and gradient boosting machines (XGBoost, LightGBM, CatBoost). Hyperparameters were optimized using Bayesian techniques, and a weighted Meta Ensemble model, which combines predictions from the best individual models, was implemented. Performance was rigorously evaluated using R2, RMSE, and MAE on an independent test set. The Meta Ensemble model demonstrated superior performance, achieving a Coefficient of Determination (R2) of 0.954 on the test data, significantly surpassing baseline and individual advanced models. Model interpretability analysis using SHAP (Shapley Additive explanations) confirmed that catchment area and geographical attributes are the most dominant predictors. The resulting model provides a robust, accurate, and scalable data-driven solution for estimating long-term average discharge, enhancing water resource assessment capabilities and offering a powerful tool for large-scale hydrological analysis.
Keywords: water resources assessment; hydraulic structures; machine learning; ensemble learning; discharge prediction; hydrological modeling water resources assessment; hydraulic structures; machine learning; ensemble learning; discharge prediction; hydrological modeling

Share and Cite

MDPI and ACS Style

Neftissov, A.; Biloshchytskyi, A.; Kazambayev, I.; Dolhopolov, S.; Honcharenko, T. An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata. Water 2025, 17, 2097. https://doi.org/10.3390/w17142097

AMA Style

Neftissov A, Biloshchytskyi A, Kazambayev I, Dolhopolov S, Honcharenko T. An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata. Water. 2025; 17(14):2097. https://doi.org/10.3390/w17142097

Chicago/Turabian Style

Neftissov, Alexandr, Andrii Biloshchytskyi, Ilyas Kazambayev, Serhii Dolhopolov, and Tetyana Honcharenko. 2025. "An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata" Water 17, no. 14: 2097. https://doi.org/10.3390/w17142097

APA Style

Neftissov, A., Biloshchytskyi, A., Kazambayev, I., Dolhopolov, S., & Honcharenko, T. (2025). An Advanced Ensemble Machine Learning Framework for Estimating Long-Term Average Discharge at Hydrological Stations Using Global Metadata. Water, 17(14), 2097. https://doi.org/10.3390/w17142097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop