Using Statistical and Machine Learning Algorithms for Big Data Applications in Hydrology

A special issue of Water (ISSN 2073-4441). This special issue belongs to the section "New Sensors, New Technologies and Machine Learning in Water Sciences".

Deadline for manuscript submissions: closed (15 December 2022) | Viewed by 15215

Special Issue Editors


E-Mail Website
Guest Editor
Department of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, Greece
Hellenic Air Force General Staff, Hellenic Air Force, Mesogion Avenue 227–231, 155 61 Cholargos, Greece
Interests: forecasting; machine learning; statistical hydrology; statistical learning

E-Mail Website
Guest Editor
1. Department of Water Resources and Environmental Modeling, Faculty of Environmental Sciences, Czech University of Life Sciences, Kamýcá 129, Praha-Suchdol, 16500 Prague, Czech Republic
2. Department of Engineering, Roma Tre University, Rome, Italy
3. Department of Civil Engineering, School of Engineering, University of Patras, University Campus, Rio, 26 504 Patras, Greece
4. Department of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, Greece
Interests: forecasting; machine learning; statistical hydrology; statistical learning

Special Issue Information

Dear Colleagues,

The use of statistical and machine learning algorithms in hydrology has witnessed a rapid increase during the last decade due to increasing software and data availability. These algorithms are characterized by high predictive performance. They are also easy to use and can be implemented in big data applications with satisfactory results.

In this Special Issue, contributions concerning the use of statistical and machine learning algorithms in modeling hydrological big data are welcome. The submitted manuscripts should meet the following requirements:

  • Applications should be based on freely available big datasets.
  • Benchmarking of complex algorithms against simpler ones is necessary.
  • Interpretation of the modeled phenomena is also necessary.

Interested contributors are requested to first contact the Guest Editors.

Dr. Hristos Tyralis
Dr. Georgia Papacharalampous
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Water is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • forecasting
  • machine learning
  • statistical hydrology
  • statistical learning

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

12 pages, 4377 KiB  
Article
Comparison between Machine-Learning-Based Turbidity Models Developed for Different Lake Zones in a Large Shallow Lake
by Runtao Hu, Wangchen Xu, Wenming Yan, Tingfeng Wu, Xiangyu He and Nannan Cheng
Water 2023, 15(3), 387; https://doi.org/10.3390/w15030387 - 17 Jan 2023
Cited by 1 | Viewed by 1316
Abstract
Machine learning has been used to mine the massive data collected by automatic environmental monitoring systems and predict the changes in the environmental factors in lakes. However, further study is needed to assess the feasibility of the development of a universal machine-learning-based turbidity [...] Read more.
Machine learning has been used to mine the massive data collected by automatic environmental monitoring systems and predict the changes in the environmental factors in lakes. However, further study is needed to assess the feasibility of the development of a universal machine-learning-based turbidity model for a large shallow lake with considerable spatial heterogeneity in environmental factors. In this study, we collected and examined sediment and water quality data from Lake Taihu, China. Three monitoring stations were established in three lake zones to obtain continuous time series data of the water quality and meteorological variables. We used these data to develop three turbidity models based on long short-term memory (LSTM). The three zones differed in terms of environmental factors related to turbidity: in West Taihu, the Lake Center, and the mouth of Gonghu Bay, the critical shear stress of bed sediments was 0.029, 0.055, and 0.032 N m−2, and the chlorophyll-a concentration was 23.27, 14.62, 30.80 μg L−1, respectively. The LSTM-based turbidity model developed for any zone could predict the turbidity in the other two zones. For the model developed for West Taihu, its performance to predict the turbidity in the local zone (i.e., West Taihu) was inferior to that for the other zones; the reverse applied to the models developed for the Lake Center and Gonghu Bay. This can be attributed to the complex hydrodynamics in West Taihu, which weakens the learning of LSTM from the time series data. This study explores the feasibility of the development of a universal LSTM-based turbidity model for Lake Taihu and promotes the application of machine learning algorithms to large shallow lakes. Full article
Show Figures

Graphical abstract

18 pages, 5643 KiB  
Article
LSTM-Based Model for Predicting Inland River Runoff in Arid Region: A Case Study on Yarkant River, Northwest China
by Jiaxin Li, Kaixuan Qian, Yuan Liu, Wei Yan, Xiuyun Yang, Geping Luo and Xiaofei Ma
Water 2022, 14(11), 1745; https://doi.org/10.3390/w14111745 - 29 May 2022
Cited by 8 | Viewed by 2292
Abstract
Inland river runoff variations in arid regions play a decisive role in maintaining regional ecological stability. Observation data of inland river runoff in arid regions have short time series and imperfect attributes due to limitations in the terrain environment and other factors. These [...] Read more.
Inland river runoff variations in arid regions play a decisive role in maintaining regional ecological stability. Observation data of inland river runoff in arid regions have short time series and imperfect attributes due to limitations in the terrain environment and other factors. These shortages not only restrict the accurate simulation of inland river runoff in arid regions significantly, but also influence scientific evaluation and management of the water resources of a basin in arid regions. In recent years, research and applications of machine learning and in-depth learning technologies in the hydrological field have been developing gradually around the world. However, the simulation accuracy is low, and it often has over-fitting phenomenon in previous studies due to influences of complicated characteristics such as “unsteady runoff”. Fortunately, the circulation layer of Long-Short Term Memory (LSTM) can explore time series information of runoffs deeply to avoid long-term dependence problems. In this study, the LSTM algorithm was introduced and improved based on the in-depth learning theory of artificial intelligence and relevant meteorological factors that were monitored by coupling runoffs. The runoff data of the Yarkant River was chosen for training and test of the LSTM model. The results demonstrated that Mean Absolute Error (MAE) and Root Mean Square error (RMSE) of the LSTM model were 3.633 and 7.337, respectively. This indicates that the prediction effect and accuracy of the LSTM model were significantly better than those of the convolution neural network (CNN), Decision Tree Regressor (DTR) and Random Forest (RF). Comparison of accuracy of different models made the research reliable. Hence, time series data was converted into a problem of supervised learning through LSTM in the present study. The improved LSTM model solved prediction difficulties in runoff data to some extent and it applied to hydrological simulation in arid regions under several climate scenarios. It not only decreased runoff prediction uncertainty brought by heterogeneity of climate models and increased inland river runoff prediction accuracy in arid regions, but also provided references to basin water resource management in arid regions. In particular, the LSTM model provides an effective solution to runoff simulation in regions with limited data. Full article
Show Figures

Figure 1

20 pages, 6051 KiB  
Article
Time Series Features for Supporting Hydrometeorological Explorations and Predictions in Ungauged Locations Using Large Datasets
by Georgia Papacharalampous and Hristos Tyralis
Water 2022, 14(10), 1657; https://doi.org/10.3390/w14101657 - 23 May 2022
Cited by 10 | Viewed by 2712
Abstract
Regression-based frameworks for streamflow regionalization are built around catchment attributes that traditionally originate from catchment hydrology, flood frequency analysis and their interplay. In this work, we deviated from this traditional path by formulating and extensively investigating the first regression-based streamflow regionalization frameworks that [...] Read more.
Regression-based frameworks for streamflow regionalization are built around catchment attributes that traditionally originate from catchment hydrology, flood frequency analysis and their interplay. In this work, we deviated from this traditional path by formulating and extensively investigating the first regression-based streamflow regionalization frameworks that largely emerge from general-purpose time series features for data science and, more precisely, from a large variety of such features. We focused on 28 features that included (partial) autocorrelation, entropy, temporal variation, seasonality, trend, lumpiness, stability, nonlinearity, linearity, spikiness, curvature and others. We estimated these features for daily temperature, precipitation and streamflow time series from 511 catchments and then merged them within regionalization contexts with traditional topographic, land cover, soil and geologic attributes. Precipitation and temperature features (e.g., the spectral entropy, seasonality strength and lag-1 autocorrelation of the precipitation time series, and the stability and trend strength of the temperature time series) were found to be useful predictors of many streamflow features. The same applies to traditional attributes such as the catchment mean elevation. Relationships between predictor and dependent variables were also revealed, while the spectral entropy, the seasonality strength and several autocorrelation features of the streamflow time series were found to be more regionalizable than others. Full article
Show Figures

Figure 1

15 pages, 2687 KiB  
Article
Quantile-Based Hydrological Modelling
by Hristos Tyralis and Georgia Papacharalampous
Water 2021, 13(23), 3420; https://doi.org/10.3390/w13233420 - 03 Dec 2021
Cited by 16 | Viewed by 3279
Abstract
Predictive uncertainty in hydrological modelling is quantified by using post-processing or Bayesian-based methods. The former methods are not straightforward and the latter ones are not distribution-free (i.e., assumptions on the probability distribution of the hydrological model’s output are necessary). To alleviate possible limitations [...] Read more.
Predictive uncertainty in hydrological modelling is quantified by using post-processing or Bayesian-based methods. The former methods are not straightforward and the latter ones are not distribution-free (i.e., assumptions on the probability distribution of the hydrological model’s output are necessary). To alleviate possible limitations related to these specific attributes, in this work we propose the calibration of the hydrological model by using the quantile loss function. By following this methodological approach, one can directly simulate pre-specified quantiles of the predictive distribution of streamflow. As a proof of concept, we apply our method in the frameworks of three hydrological models to 511 river basins in the contiguous US. We illustrate the predictive quantiles and show how an honest assessment of the predictive performance of the hydrological models can be made by using proper scoring rules. We believe that our method can help towards advancing the field of hydrological uncertainty. Full article
Show Figures

Figure 1

17 pages, 4269 KiB  
Article
A Medium and Long-Term Runoff Forecast Method Based on Massive Meteorological Data and Machine Learning Algorithms
by Yujie Li, Jing Wei, Dong Wang, Bo Li, Huaping Huang, Bin Xu and Yueping Xu
Water 2021, 13(9), 1308; https://doi.org/10.3390/w13091308 - 07 May 2021
Cited by 13 | Viewed by 3337
Abstract
Accurate and reliable predictors selection and model construction are the key to medium and long-term runoff forecast. In this study, 130 climate indexes are utilized as the primary forecast factors. Partial Mutual Information (PMI), Recursive Feature Elimination (RFE) and Classification and Regression Tree [...] Read more.
Accurate and reliable predictors selection and model construction are the key to medium and long-term runoff forecast. In this study, 130 climate indexes are utilized as the primary forecast factors. Partial Mutual Information (PMI), Recursive Feature Elimination (RFE) and Classification and Regression Tree (CART) are respectively employed as the typical algorithms of Filter, Wrapper and Embedded based on Feature Selection (FS) to obtain three final forecast schemes. Random Forest (RF) and Extreme Gradient Boosting (XGB) are respectively constructed as the representative models of Bagging and Boosting based on Ensemble Learning (EL) to realize the forecast of the three types of forecast lead time which contains monthly, seasonal and annual runoff sequences of the Three Gorges Reservoir in the Yangtze River Basin. This study aims to summarize and compare the applicability and accuracy of different FS methods and EL models in medium and long-term runoff forecast. The results show the following: (1) RFE method shows the best forecast performance in all different models and different forecast lead time. (2) RF and XGB models are suitable for medium and long-term runoff forecast but XGB presents the better forecast skills both in calibration and validation. (3) With the increase of the runoff magnitudes, the accuracy and reliability of forecast are improved. However, it is still difficult to establish accurate and reliable forecasts only large-scale climate indexes used. We conclude that the theoretical framework based on Machine Learning could be useful to water managers who focus on medium and long-term runoff forecast. Full article
Show Figures

Figure 1

Back to TopTop