Research on Water Resource Modeling Based on Machine Learning Technologies

Liu, Ze; Zhou, Jingzhao; Yang, Xiaoyang; Zhao, Zechuan; Lv, Yang

doi:10.3390/w16030472

Open AccessEditor’s ChoiceReview

Research on Water Resource Modeling Based on Machine Learning Technologies

by

Ze Liu

^1,2,*,

Jingzhao Zhou

¹,

Xiaoyang Yang

¹,

Zechuan Zhao

¹ and

Yang Lv

³

¹

College of Water Resources and Architectural Engineering, Northwest A&F University, Xianyang 712100, China

²

Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas, Ministry of Education, Northwest A&F University, Xianyang 712100, China

³

College of Mechanical and Electronic Engineering, Northwest A&F University, Xianyang 712100, China

^*

Author to whom correspondence should be addressed.

Water 2024, 16(3), 472; https://doi.org/10.3390/w16030472

Submission received: 18 December 2023 / Revised: 26 January 2024 / Accepted: 27 January 2024 / Published: 31 January 2024

(This article belongs to the Special Issue Application of Machine Learning to Water Resource Modeling)

Download

Browse Figures

Versions Notes

Abstract

Water resource modeling is an important means of studying the distribution, change, utilization, and management of water resources. By establishing various models, water resources can be quantitatively described and predicted, providing a scientific basis for water resource management, protection, and planning. Traditional hydrological observation methods, often reliant on experience and statistical methods, are time-consuming and labor-intensive, frequently resulting in predictions of limited accuracy. However, machine learning technologies enhance the efficiency and sustainability of water resource modeling by analyzing extensive hydrogeological data, thereby improving predictions and optimizing water resource utilization and allocation. This review investigates the application of machine learning for predicting various aspects, including precipitation, flood, runoff, soil moisture, evapotranspiration, groundwater level, and water quality. It provides a detailed summary of various algorithms, examines their technical strengths and weaknesses, and discusses their potential applications in water resource modeling. Finally, this paper anticipates future development trends in the application of machine learning to water resource modeling.

Keywords:

water resource; machine learning; precipitation; flood; runoff; soil moisture; evapotranspiration; groundwater level; water quality

1. Introduction

With the escalating severity of global water shortage problems, the management and efficient utilization of water resources have become a critical research focus [1,2]. In China, the total volume of freshwater resources amounts to 2800 billion cubic meters, representing 6% of global water resources and ranking fourth globally, following Brazil, Russia, and Canada [3]. However, China’s per capita water resources stand at only 2300 cubic meters, equivalent to just a quarter of the global average, categorizing it among the countries with the scarcest per capita water resources globally [4]. In China, precipitation diminishes from the southeast coast to the northwest inland, categorized into five zones: rainy, humid, semi-humid, semi-arid, and arid [5]. Owing to the heterogeneous distribution of precipitation across regions, a pronounced supply–demand imbalance of water resources exists in China, particularly in the northern regions [6]. With population growth and economic advancement, the demand for water resources steadily increases, while the supply remains limited, leading to water shortages in some areas [7]. In some regions, water resource utilization efficiency in China is noticeably low, primarily due to inadequate focus on water resources, insufficient effective water-saving measures, and a lack of institutional guarantees [8]. Furthermore, with the accelerated pace of industrialization and urbanization, substantial quantities of industrial and agricultural wastewater, domestic sewage, and solid waste are persistently discharged, leading to severe pollution of rivers, lakes, and groundwater [9]. Consequently, there is a critical need to enhance the protection and utilization of water resources and to implement effective measures aimed at improving water resource utilization efficiency [10].

Water resources constitute renewable resources and demonstrate variability in annual and interannual quantities, possessing a distinct cycle and pattern (Figure 1). Influenced by solar radiation, Earth’s gravity, and other physical processes, including evapotranspiration, precipitation, soil infiltration, surface runoff, and underground flow, water translocates from one location to another [11]. Water resources form a critical foundation for the survival and development of human society [12]. To enhance the management and utilization of water resources, various computational models have been employed in water resource management [13]. The traditional hydrological model operates on the principle of the hydrological cycle, articulating the components of the hydrological cycle system [14]. These models hold significant physical importance and are straightforward to elucidate; however, their development process is complex and requires extensive expertise from developers [15]. Machine learning can process massive amounts of data, extract valuable information, and automatically construct models to predict future trends [16]. In water resource management, such data enable a more comprehensive consideration of problems and the establishment of improved decision-making models [17]. Consequently, machine learning, owing to its robust data mining capabilities, is widely applied in areas like water resource supply and demand prediction, flood risk management, water quality monitoring, and forecasting scenarios [18,19,20,21,22].

Researchers have reviewed the application of machine learning in water resource modeling, laying the foundation for further in-depth research [23,24,25]. Mosaffa et al. reviewed the application of machine learning in flood, precipitation estimation, water quality, and groundwater, proposing that machine learning outperforms traditional physical models in flood prediction [23]. Zounemat-Kermani et al. reviewed the application of ensemble machine learning paradigms in hydrology, finding that boosting, adaboost, and extreme gradient boosting outperform bagging, stacking, and other traditional methods in hydrological research [24]. Başağaoğlu et al. conducted a review on the use of interpretable and explainable machine learning models, with a particular emphasis on the tree-based ensemble artificial intelligence models, in diverse hydroclimatic contexts, and concluded that interpretable and explainable artificial models effectively improve the explainability of decisions and unveil new knowledge [25]. However, the depth and breadth of its review still need to be further improved, and this review can be seen as a complementary study of these reviews. In this review, the advantages, drawbacks, and challenges of machine learning in water resource modeling were analyzed. A comprehensive study was undertaken to explore the broad spectrum of applications of machine learning in water resources modeling, encompassing precipitation, flooding, urban waterlogging, runoff, evapotranspiration, soil moisture, groundwater level, and water quality. Our review is not limited to a specific type of machine learning algorithm; it encompasses a wider range of research, including link-based models (artificial neural network (ANN) and deep learning), tree-based models (decision tree (DT) and randomforest (RF)), and statistical-based models (support vector machine (SVM) and logistic regression (LR)). Meanwhile, new research avenues for downstream water resource research are also outlined, laying the foundation for further in-depth research.

2. Overview of Machine Learning

Machine learning has undergone significant evolution over nearly 60 years since the Dartmouth Conference in 1956, transitioning from a niche field to one with broadly recognized importance [26]. During this period, numerous algorithms have been developed, including linear regression, LR, DT, SVM, ANN, and deep learning models. As shown in Figure 2, machine learning mainly entails the following: (1) data collection, including structured data (such as table data in a database) and unstructured data (such as text, images, and audio) from multiple data sources; (2) data organization, including data cleaning, data conversion, and feature extraction, wherein the organized data can be stored in corresponding databases for easy use; (3) the selection of suitable models according to the requirements of specific problems; and (4) the application of model deployment to actual water resource management applications after model evaluation and optimization. The broad range of machine learning techniques has proven adequate in meeting most demands [27]. However, concerns among researchers have arisen regarding the appropriate application of these algorithms to specific research questions.

2.1. Choosing the Right Method for Different Problems

Machine learning is broadly categorized into two main types: supervised and unsupervised learning, based on whether the samples are labeled or not [28]. Among these, supervised learning is further divided into classification and regression problems, depending on whether the label values are continuous or discrete [29]. In cases of linearly separable problems, models such as LR, linear discriminant analysis (LDA), perceptron, and hard margin SVM tend to perform well. However, these models are ineffective for linearly inseparable problems. For such scenarios, nonlinear models, including DT, ANN, RF, adaboost, soft margin SVM, and kernel SVM, are suitable. In practice, sampled data often contain noise, and strictly separating samples of different categories may lead to overfitting of the model, resulting in the learning of noise [30]. Nonlinear models can enhance the model’s generalization performance by tolerating some degree of misclassification [31,32]. To facilitate researchers in selecting appropriate methods for their own issues, the applicability, advantages, and disadvantages of the main machine learning algorithms are summarized in Table 1.

2.2. Selection of Appropriate Assessment Methods for Different Issues

In the machine learning modeling process, datasets are typically randomly divided into a training dataset and an independent testing dataset. During the feature selection process, the training dataset is further divided into a training subset and a validation subset [33]. After identifying the optimal subset of features, the training and validation subsets are combined to train the final model. Furthermore, appropriate indicators are required for evaluation based on the specific task. As shown in Table 2, for classification problems, indicators like accuracy (ACC), sensitivity (Sn), specificity (Sp), the Matthews correlation coefficient (MCC), F1, and area under the curve (AUC) are applicable; for regression tasks, indicators such as R², mean squared error (MSE), mean absolute error (MAE), and root mean squared error (RMSE) are appropriate; for clustering tasks, external indicators (Jaccard (JC), Fowlkes and Mallows index (FMI), Rand index (RI)) or internal indicators (Davies-Bouldin index (DBI), Dunn index (DI)) are suitable [34]. To objectively assess the model’s performance, common methods include the hold-out, k-fold cross-validation, and bootstrap approaches [33]. When employing the hold-out method, it is recommended to randomly divide the dataset into multiple portions. In each split, approximately 70–80% of the data is allocated for training, while the remainder is set aside for testing. The final evaluation of the model is derived by averaging all the test results. In the k-fold cross-validation process, the training dataset is partitioned into roughly k equal segments. From these segments, k-1 is utilized for training, while the remaining segment serves as the test set. Each segment is then alternately used as the test set. The outcomes of the k trials are then averaged to evaluate the model’s performance. If k equals the number of samples in the dataset, this approach transforms into the leave-one-out method [35]. Additionally, for smaller datasets, the bootstrap method can be used to generate a training set through resampling. This approach is particularly effective in training RF-based models.

3. Application of Machine Learning for Water Resource Modeling

This article reviews the application of machine learning in predicting various hydrological factors. In each section, the research progress, important conclusions, existing problems, and future development directions are discussed.

3.1. Precipitation Prediction

Precipitation is a crucial component of Earth’s water cycle and plays a significant role in maintaining the water and energy balance. Accurately predicting precipitation is vital for flood control and guiding government decisions on water resource management. However, precipitation is highly uncertain in both temporal and spatial dimensions due to variations in factors such as altitude, air humidity, and regional characteristics [36]. As a pivotal research domain [37], precipitation forecasting predominantly employs two methods; the initial approach involves simulating potential physical laws through an analysis of precipitation processes. For instance, Hu and Wu gathered monthly data spanning 2010 to 2019 from 16 major meteorological stations in Jiangxi Province. Employing the M-K mutation test and Kriging method, they projected the monthly cumulative rainfall in Jiangxi Province for the period 2020 to 2021 [38]. Wang et al. evaluated the vulnerability of China’s renewable energy production, particularly hydropower generation, by scrutinizing the long-term relationship between hydropower generation and climatic factors, including precipitation and the installed capacity of hydropower plants [39]. Gui and Shao partitioned the annual precipitation data for Xianshan spanning 1961 to 2015 into 50 years, computed the weights for each division stage, and forecasted the annual precipitation of Xianshan for the period 2011 to 2015 [40]. Although the process-based precipitation prediction model enhances predictive capabilities to some extent, it still possesses limitations. Numerous studies have exclusively concentrated on the cyclical sequence of precipitation, overlooking trend-based forecasting. Owing to the complexity and variability of weather factors influencing precipitation, traditional forecasting models encounter challenges in delivering precise forecasts.

The second approach involves using machine learning to predict precipitation. Dimri et al. extracted nine weather variables, including maximum temperature, minimum temperature, dry bulb temperature, 24 h average wind speed, wind direction, wind speed, surface pressure, cloud cover, and cloud type, as features and utilized the K-nearest neighbor (KNN) method to predict rainfall three days in advance [41]. Ghazvinian et al. used the minimum and maximum temperatures, average relative humidity, wind speed, and sunshine hours from 2000 to 2018 as features to establish an ANN-based model. The experimental results showed that the model effectively predicted the monthly rainfall in Semnan City [42]. The tree-based RF algorithm has also been applied to quantitative precipitation estimation (QPE) and successfully reduced errors and biases in precipitation intensity prediction, especially for heavy precipitation, solid precipitation, or mixed precipitation types [43]. In addition, the statistical-based model (Support Vector Regression (SVR)) has been used to predict seasonal precipitation. Umirbekov et al. used SVR to investigate the long-term correlation between global climate change from February to June and the peak precipitation periods in the Tian Shan and Pamir Plateau of Central Asia [44]. Traditional machine learning methods can handle complex precipitation data. However, they are often prone to local optima and overfitting. Therefore, further improvements are needed to enhance the accuracy of precipitation prediction. Kumar et al. used meridional monsoon rainfall, regional or global circulation parameters as features and employed a hybrid model combining Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN) techniques to predict monthly rainfall in India from 1871 to 2016. However, the predictive performance of this model may face challenges under significant noise conditions in precipitation data [45]. Recently, Shen and Ban used a coupled empirical model to decompose precipitation sequences. They utilized the initial modal components obtained during the decomposition process as inputs to the SVM-based model and the remaining modal components as inputs to the LSTM-based model. The research results showed that this method significantly improves the consistency between observation and estimation. Additionally, this method effectively reduced the impact of noise on the model [46]. Zhang et al. used the monthly rainfall data of Zhongwei City for 18 years (1999–2016) as training samples and constructed a rainfall prediction model using CEEMDAN-PSO-ELM. The results showed that the use of a hybrid model can reduce the interference of feature information and non-stationarity from different intervals while exhibiting high generalization and prediction accuracy [47]. In summary, the continuous progress of machine learning has created new opportunities for precipitation prediction.

In September 2023, Huawei developed a new high-resolution global artificial intelligence weather forecasting system: the Pangu Weather Model [48]. The Pangu Weather Model is superior to traditional numerical methods for the first time in medium- and long-term weather forecasting. It also uses a hierarchical time-domain aggregation strategy to reduce the number of forecast iterations, thereby reducing the iteration error and improving the forecast accuracy and computation speed. The features considered in the model can also become the focus of subsequent researchers, including thirteen different pressure layers at vertical altitude, five meteorological elements in each layer (temperature, humidity, potential, latitude and longitude wind speed), and four meteorological elements on the Earth’s surface (2 m temperature, 10 m latitude and longitude wind speed, sea level pressure). However, due to the limitations of incomplete meteorological data, lack of interpretability, and not being allowed in extreme weather prediction, the accuracy of machine models in precipitation prediction is still low and cannot meet the needs of real meteorological applications. Thus, it is urgent for researchers to develop new algorithms to solve these problems.

3.2. Flood Forecasting

Flood prediction models are typically categorized into two types: process-driven and data-driven models [49]. Process-driven models rely on traditional rainfall and runoff hydrological models, including physical process simulations, which have clear hydrological implications. However, this type of modeling necessitates a substantial amount of hydrological data from the catchment area and a considerable amount of empirical knowledge from researchers. Additionally, it faces challenges in areas such as parameter rate determination. In contrast, machine learning methods do not require modeling hydrological processes; instead, they necessitate establishing a reasonable relationship between input and output data [50,51,52].

As early as 1981, Xu et al. considered factors such as the maximum water shortage in the watershed, average daily rainfall, and watershed area. They established a flood forecasting model using multiple linear regression (MLR). The results showed that MLR exhibited higher accuracy than the univariate regression method [53]. Zhang et al. established a mixed linear regression model for flood forecasting in Niyang River Basin based on hydrological data from Jiangda, Baheqiao, and Gengzhang stations. The results showed that the mixed model had higher accuracy compared to MLR [54]. In addition, Latt et al. also utilized the correlation between flood water level and rainfall to establish a stepwise multiple linear regression model and used it to estimate the water level of the Chindhwin River in Myanmar [55]. However, due to the fact that linear regression is inherently linear, it faces challenges in analyzing nonlinear data. Therefore, with the continuous advancement of machine learning technology, flood prediction models are progressively integrating nonlinear models [56]. In 1995, Hsu et al. initially employed backpropagation (BP) neural networks to predict floods based on rainfall and flow data measured by four rain gauges. They achieved superior results compared to other models [57]. Wang et al. utilized hourly rainfall data from 11 rainfall stations in the Ding’an River Basin and combined BP with the Xin’an River model to predict flood. The results indicated that their proposed model significantly improves the accuracy of flood prediction [14].

Although ANNs have been widely applied, they still have obvious drawbacks, including slow convergence speed, susceptibility to local optima, and gradient vanishing. Liong et al. trained an SVM-based model using a radial basis kernel function to predict flood and monitor water levels in Dhaka, Bangladesh. They used the current daily water levels of five stations and considered the impact of lateral inflow intensity on downstream measurement station water levels [58]. To compensate for the limitations of SVR, Nguyen et al. introduced the V-SVR model. They used time-by-time rainfall and river segment data from six gauging stations at Liwu station for the years 2012–2018 and considered the influences of the source event (typhoon or storm), date of occurrence, duration of rainfall, flood stage, and total rainfall. The results indicated that the model significantly reduces the training time and improves the nonlinear fitting ability of the flood process [59].

To address the growing need for accurate flood prediction and generalization in flood monitoring, Lohani et al. trained an Adaptive Neuro-Fuzzy Inference System (ANFIS) using monthly mean inflow time-series data from Bhakra dam. The results showed that their model was more accurate than those achieved with ANN and autoregressive models [60]. RF is an integrated model composed of decision trees. Li et al. employed RF to establish a daily water level prediction model for Poyang Lake based on six hydrological stations and corresponding daily flow and water level observation data. The results indicated that the RF model surpasses ANN, SVM, and LR in terms of predictive performance and has stronger interpretability [61].

The Convolutional Neural Network (CNN) is a pioneering deep learning algorithm with a powerful representation learning capability. Kabir et al. utilized CNN to predict flood depth and inundation. The results demonstrated that CNN performed significantly better than SVR [62]. Hosseny et al. validated, processed, and summarized the monthly average inflow time series data of the Bhakra Dam using HYMOS software and enhanced the CNN-based model with U-NetRiver. The results showed that U-NetRiver could identify the river morphology and flood volume in floodplains, improving the prediction accuracy of flood depth by 29% [63]. Hu et al. developed the LSTM-ROM model in conjunction with the Reduced Order Model (ROM). This model reduced the dimensionality of the spatial dataset and efficiently represented the spatial and temporal distributions of floods [64]. Additionally, Google introduced an innovative inundation modeling method called the Morphological Inundation Model (MIM). This model takes the water level of a specific point in the river (water level gauge) as the input and outputs the water levels of all points in the river. By combining physics-based modeling with machine learning, it generates more accurate and scalable inundation models in real-world scenarios [65].

Urban Waterlogging Prediction

Extreme precipitation has become more frequent and intense due to the increasing frequency of extreme weather events and accelerated urbanization globally. Consequently, rainfall-induced flooding is becoming a constant risk [66]. The expansion of urban areas has led to higher population densities and economic activities, resulting in increased flooding hazards due to inland waterlogging. The impacts of heavy rainfall-induced flooding on the construction and operation of Chinese cities are becoming increasingly evident. Therefore, rapid prediction and early warning of urban flooding caused by heavy rainfall are essential to improve the disaster response capacity of the water sector [67].

The conventional prediction methods include empirical, physical, and spatial models relying on geographic information [68]. For instance, Krupka created a swift flood inundation model by utilizing a digital elevation model (DEM). This model initially outlines the distribution of individual depressions and then tracks the flood levels and extent of flooding at each location by calculating the flow direction and flow accumulation using the filled DEM. This model is suitable for predicting floods and managing flood risks in urban areas [69]. Zhang and Pan proposed an urban storm inundation simulation method based on a distributed hydrological model that utilizes a geographic information system (GIS) for distributed hydrological modeling [70]. Huang et al. developed Info Works Integrated Watershed Management (IWM), an integrated urban watershed drainage model, to quickly and effectively calculate the depth and extent of inundation [71]. Zeng et al. developed a model to simulate urban inundation during heavy rainfall events. This approach lays the groundwork for future research on early warning systems for urban floods [72]. The currently used traditional prediction models require long computational times, which are further increased when applied for large-scale or high-spatial-resolution modelling tasks [73]. Despite improvements in physical models for large-scale urban modeling, contemporary state-of-the-art physical prediction methods still exhibit relative inefficiencies, especially in studies that require iterative data analysis [74].

Due to the limitations of the physical model, researchers have developed a rainstorm flood early warning and forecasting method using time series analysis. This method is relatively simple to implement and requires a limited understanding of hydrological and hydraulic principles. Zheng et al. improved the spatiotemporal to regressive moving average model of rainfall and flood detention, considering the impact of rainfall, confluence, and the drainage process. This model incorporates the confluence area, ground structure, and drainage conditions of each ponding point for the short-term prediction of urban temporary rainstorm flood detention [75]. In recent years, many scholars have used machine learning to predict rainstorm and flood disasters. Lai et al. applied a self-organizing map (SOM) and variance network (ANN) to classify and assess flood risk in 56 low-lying areas in Beijing [76]. On the basis of this study, Yan et al. utilized monitoring data, topographic data, Jinlong River data, and local rainfall statistics as features, and developed two SVM-based models to predict rainstorm waterlogging and maximum flood depth [77]. Wu et al. used historical rainfall data as the input variable of the model and the accumulated water depth as the output variable to build the relationship between rainfall and accumulated water volume. They utilized the gradient boosting decision tree (GBDT) algorithm to construct the prediction model of tidal detention process under the urban rainstorm scenario [78]. Li et al. developed a rainstorm waterlogging disaster prediction model using the extreme gradient boosting (XGBoost) model and demonstrated its stronger performance compared to the BP neural network [79]. Wang et al. selected parameters such as altitude, slope, distance from watercourse, road density, vegetation coverage, soil water retention, and impervious surface percentage as spatial variables and designed a framework for assessing urban flood risk using the weighted naive Bayes (WNB) classifier and complex network model (CNM) [80]. Although machine learning technology is often used to predict flood inundation caused by rainstorms, most studies focus on predicting a single-point flood level. Recently, many researchers have adopted the synergy of the numerical simulation model and machine learning method to achieve faster and more accurate urban flood prediction [81,82].

Among the flood prediction models discussed, MIM [65] currently stands out as the most prominent model. In flood forecasting, precise measurement of the inundation depth and determination of the inundation area are crucial factors utilized by flood forecasting models to assess the risk level and predict the flooded area. However, earlier machine learning models encountered some challenges in modeling inundation at a large scale, such as demanding high computational complexity due to the extensive areas involved and the necessary resolution. Additionally, it is worth noting that many global elevation maps lack riverbed bathymetry, a crucial component for accurate modeling. Moreover, it is crucial to identify and rectify any errors in the available data, such as instrumentation measurement errors or missing features in the elevation maps.

3.3. Runoff Prediction

Milly et al. suggested utilizing runoff as an indicator for sustainable water availability in natural environments [83]. Changes in the sustainable water supply are prone to causing substantial regional impacts on economies and ecosystems. Forecasting runoff plays a crucial role in optimizing water resource systems and mitigating the impacts of destructive natural disasters, such as floods and droughts, through both long-term planning and short-term emergency warnings [84,85,86]. Due to the complexity of the causes of runoff and the challenging nature of understanding their mechanisms, constructing machine learning-based models for runoff prediction emerges as an effective solution [87]. When classified by time scale, runoff prediction can be segmented into medium- and long-term predictions, as well as short-term forecasting [88].

3.3.1. Medium- and Long-Term Runoff Prediction

Prediction of medium- and long-term runoff is vital for developing water resource scheduling plans that span extended durations, thereby significantly impacting water resource management [89]. Ghumman et al. developed an ANN-based runoff model using continuously measured monthly rainfall and runoff data, applying it to predict monthly runoff in the Hoab River Basin of Pakistan [90]. ANNs do not require a detailed investigation of hydrological and geological parameters of the catchment to perform similarly to traditional conceptual models. Nevertheless, the dataset frequently contains a high proportion of noise and errors, which can impede the ability of ANN-based models to make efficient and accurate predictions. Owing to climate change and human activities, natural runoff often contains multiple frequency components, posing a challenge for traditional ANN-based models in efficiently capturing the underlying change processes.

Tan et al. employed multi-year runoff data and introduced an ANN-based runoff model combined with ensemble empirical modal decomposition (EEMD). The results demonstrate that the proposed EEMD-ANN model significantly enhances the accuracy of the ANN-based method in predicting the medium- and long-term runoff time series [91]. Liao proposed a hybrid framework for long-term runoff prediction, incorporating pre-inflow and specific meteorological factors, such as precipitation, evapotranspiration, solar radiation, soil temperature, etc., as input features. The results showed that the accuracy is improved by combining EEMD and ANN for modeling [92]. The challenge in medium- and long-term runoff prediction lies in its low accuracy, arising from the extended prediction period and the complexity of the runoff genesis mechanism. Consequently, identifying key factors is of critical importance. Han et al. introduced a LSTM-based model, AT-LSTM, combining double attention mechanisms in the input and hidden layers. The model uses rainfall, potential evapotranspiration, and monthly climate phenomenon index data (including 88 atmospheric circulation indicators, 26 sea surface temperature indicators, and 16 other indicators) as inputs for long-term runoff prediction. The results demonstrate that the AT-LSTM model effectively enhances the accuracy of long-term prediction and identifies the dynamic effects of the input factors [93].

3.3.2. Short Term Runoff Prediction

Zealand et al. examined the efficacy of ANNs for short-term runoff prediction, and the results demonstrating that the ANN-based models consistently outperform traditional models [94]. Kratzert utilized meteorological forcing data (maximum air temperature, precipitation, radiation, vapor pressure, etc.) and static catchment characteristics (drought index, PET mean, max water content, geological permeability, forest fraction, etc.) as features to build an LSTM-based model for daily runoff prediction. The obtained results surpassed those of the well-established physical model [84]. Based on data from 98 rainfall runoff events, Hu et al. compared the performance of the ANN-based and LSTM-based models for simulating runoff processes during runoff events over a delivery period of 1 to 6 h. The results showed that the LSTM-based model outperformed the ANN-based models [95]. Gao et al. developed a short-term runoff prediction model using LSTM and gated recurrent unit (GRU) networks, based on hourly flow measurements from one runoff station and hourly rainfall data from four rainfall stations [96]. The experiments demonstrated that the GRU model requires the least training time and has a simpler structure, making it the preferred method for short-term runoff prediction.

Despite achieving high accuracy in runoff prediction, no model can guarantee maximum certainty in predictions due to issues like noisy or incomplete data. To improve prediction accuracy and reduce data dependency, Naganna et al. utilized the runoff time series of the Gauvery River in India. They applied deep learning techniques (CNN, RF, and Gradient Tree Boosting (GTB)), incorporating the Information Criterion (AIC) and Bayesian Information Criterion (BIC) to select the ideal input parameters for predicting the daily scale of multiple basins in the Gauvery River in India. The results indicated that deep learning (CNN), combined with AIC and BIC to select the ideal inputs to the model, achieves excellent prediction accuracy [97].

In summary, several features influence runoff, with rainfall, evapotranspiration, temperature, and radiation being the most crucial factors. For medium- and long-term runoff prediction, the best model is the AT-LSTM model combining double attention mechanisms, while for short-term runoff prediction, the CNN-based model combining AIC and BIC is the most effective. Presently, global climate change has a series of impacts on the hydrological cycle, subsequently influencing the hydrological processes in the watershed. The non-stationary hydrological sequences, influenced by factors like climatic and meteorological conditions, subsurface conditions, and human activities, introduce new challenges to hydrological forecasting and other research work [98]. Under non-stationary conditions, traditional runoff forecasting methods cannot be directly applied. In the future, it is necessary to explore the physical mechanism, integrate machine learning methods, and incorporate hydrological–meteorological information to establish a runoff prediction model suitable for non-stationary conditions.

3.4. Soil Moisture Prediction

Soil moisture is a critical variable in the climate system, influencing numerous forms of feedback at local, regional, and global scales and playing a significant role in climate change prediction [99]. It is essential for agricultural production and hydrological cycle processes, and accurate prediction is vital for the efficient use and management of water resources [100,101]. Soil moisture displays high variability in both spatial and temporal dimensions, nonlinearly influencing a variety of environmental processes [102,103]. Machine learning can provide deeper insights into complex earth science processes and enhance the predictability of seasonal forecasts and long-range, spatially correlated simulations across multiple temporal scales [104,105].

Chai et al. introduced the first ANN-based model for soil water prediction. The model, combining an ANN architecture with a single hidden layer of 20 neurons and utilizing dual-polarized brightness temperatures as inputs, successfully inverted soil moisture at a 1 km spatial resolution over a 40 km × 40 km study area, demonstrating the capability of ANN to predict the evolution of soil moisture over time with appropriate accuracy [106]. However, the model’s applicability is constrained by its heavy reliance on the accuracy of data mean and standard deviation values. Ahmad et al. developed an SVR-based model for estimating soil moisture using remotely sensed data, including backscatter and incidence angles from the Tropical Rainfall Measuring Mission (TRMM) and the Normalized Vegetation Index (NDVI) from the Advanced Very High-Resolution Radiometer (AVHRR) [107].

While SVR can be effectively used for soil moisture prediction, its utility is constrained by a short prediction period and high uncertainty. To date, data assimilation techniques have proven particularly effective in improving soil moisture prediction [108,109,110]. Liu et al. utilized information on meteorological parameters, including air temperature, relative humidity, solar radiation, and soil temperature at 5 cm and 20 cm, and employed a data assimilation method that combined SVM and Ensemble Kalman Filter (EnKF) techniques to forecast soil moisture at six different layers in the Meilin area. The validation results indicated that the proposed SVM-EnKF model could enhance the prediction results of soil moisture at different layers from the surface to the root zone [111]. In agricultural and environmental management, soil moisture prediction is typically considered across various time scales. Elsaadani et al. employed meteorological forcing data, including longwave and shortwave radiation, and relative humidity, along with land-surface variables such as rainfall and groundwater runoff, to predict soil moisture within 1 to 3 h using a spatiotemporal CNN-LSTM network [112]. While LSTM effectively predicted soil moisture over short time scales, its prediction accuracy notably diminished as the forecast period extended. To address this issue, Gao et al. explored deep bidirectional long- and short-term memory (Bid-LSTM) networks and ANNs for soil moisture prediction. Utilizing data such as air temperature, air humidity, and average solar radiation, they demonstrated that Bid-LSTM performs better in predicting soil moisture one month in advance [113]. Datta et al. proposed a multi-head LSTM model that processes soil moisture time series data aggregated at different scales as input. The results indicated that the proposed multi-head LSTM method was effective in predicting soil moisture for the upcoming month [114].

Traditional models such as ANN and SVR face challenges in analyzing complex inputs, random features, and the interrelation of climatic and hydrological properties. This limitation hinders their effectiveness in addressing crucial temporal and seasonal behaviors [115]. To overcome this challenge, Prasad et al. utilized meteorological data, soil characteristics, vegetation greenness, solar irradiance, and albedo and developed a hybrid multivariate series model, combining EEMD, Boruta, and Extreme Learning Machine (ELM). The model achieved relatively low error and high performance in predicting weekly soil moisture [116]. Utilizing the Soil Moisture Active Passive satellite dataset [117], Jamei et al. devised a feature selection method that combines Boruta-GBDT with multivariate variable pattern decomposition (MVMD) for predicting surface soil moisture [118]. The prediction accuracy experienced significant improvement, establishing the model as currently the best for soil moisture prediction.

In summary, the characteristics affecting soil moisture are complex, with temperature, humidity, and solar radiation being the most critical factors. Optimization of input data for soil moisture prediction remains an obstacle that needs to be addressed through advanced feature selection methods. The hybrid model integrates the advantages of multiple models and can effectively address the challenge of data analysis with multi-scale and non-stationary behavior, representing one of the main research directions for future soil moisture prediction.

3.5. Evapotranspiration Prediction

Evapotranspiration (ET) is a critical component of the hydrological cycle, playing a significant role in climate–soil–vegetation interactions [119]. Accurate prediction of ET is crucial for various applications, including hydrological level balancing, irrigation system design and management, crop yield simulations, water resource planning and management [120,121]. Kumar et al. employed an ANN to estimate daily reference crop evapotranspiration (ET0). Daily climatic data, incorporating solar radiation, maximum and minimum temperatures, maximum and minimum relative humidity, and wind speed were utilized as inputs, and the results indicated that ANN can predict ET0 more accurately than the traditional method [122]. Adamala et al. implemented a second-order neural network (SONN) method to predict ET0 for different climatic zones of India. The prediction was based on the characteristics of daily minimum and maximum air temperature, minimum and maximum relative humidity, wind speed, and solar radiation at 17 different locations in India, and the comparison concludes that the SONN model can be successfully applied for ET0 prediction and outperforms the feed-forward-biased feedback propagation (FFBP-NN) model [123]. Antonopoulos et al. utilized ANN to simulate daily evapotranspiration. The simulation used five years of day-by-day meteorological data, including temperature, solar radiation, wind speed, and humidity, from a single weather station in Greece [124].

However, traditional ANNs suffer from the loss of information regarding the input sequence. Additionally, the data preprocessing required for time-series singular spectroscopy analysis in these models involves complex procedures [125]. Furthermore, traditional ANNs encounter challenges such as gradient explosion or vanishing, whereas LSTM effectively addresses the issue of gradient vanishing [126]. Chen et al. developed a Time Convolutional Neural Network (TCN) and an LSTM to process incomplete meteorological data from the Northeast China plains, encompassing variables such as annual daily maximum air temperature, daily minimum air temperature, daily average relative humidity, daily extraterrestrial radiation, and solar radiation, for the purpose of estimating daily reference evapotranspiration. The performance of the three deep learning models (DNN, TCN, and LSTM) was evaluated against two classical machine learning models (SVM and RF) and empirical equations. The results indicated that the TCN and LSTM models exhibit superior performance compared to the machine learning methods and empirical equations [127]. Karbasi et al. introduced an automatic encoding–decoding bi-directional long–short-term memory model (AED-BiLSTM) based on the characteristics of daily minimum temperature, daily maximum temperature, wind speed, sunshine hours, and daily precipitation. The results showed that the newly developed model (AED-BiLSTM) has a higher forecasting ability and accuracy than the general regression neural network (GRNN) and XGBoost models at the three meteorological stations [128].

A burgeoning research trend revolves around the utilization of hybrid computational models for predicting evaporation. Hybrid heuristic algorithms, being an innovative approach in machine learning, have the potential to significantly enhance the prediction accuracy of ET0 [129]. Initially, researchers concentrated on amalgamating novel heuristic search algorithms with soft computing methods to optimize control parameters and enhance predictive accuracy [130]. To overcome challenges related to the absence of comprehensive models for all climates and limited meteorological information, Maroufpoor introduced a hybrid ANN-grey wolf optimization (ANN-GWO) model for predicting ET0 based on the characteristics of maximum and minimum air temperature, relative humidity, wind speed, sunshine hours, and precipitation [131]. Roy et al. employed daily maximum and minimum temperatures, wind speed, relative humidity, sensible heat flux, latent heat, and insolation as input features to fine-tune a hybrid model generated by ANFIS. This tuning involved four optimization algorithms: biogeography-based optimization (BBO-ANFIS), firefly algorithm (FA-ANFIS), particle swarm optimization (PSO-ANFIS), and teaching-based optimization (TLBO-ANFIS), to estimate ET0 in the subtropical climatic zone of Bangladesh. The results indicated that the integrated prediction method outperformed most of the individual models [132]. Troncoso-García et al. employed historical evapotranspiration values as inputs to optimize a recursive long–short-term memory neural network using the Coronavirus Optimization Algorithm (CVOA) bio-heuristic algorithm. Demonstrating higher ET0 prediction accuracy and significantly reduced computational time compared to traditional methods, this model stands as the current leading approach for predicting reference evapotranspiration [133].

In summary, meteorological variables such as temperature, solar radiation, wind speed, and humidity closely influencing ET0 [134]. The current study is geographically limited to regions characterized by specific climates and cannot be effectively extended to regions with different climatic conditions. One future research direction involves integrating remote sensing data, such as satellite images, to capture spatial and environmental factors influencing the dynamics of ET0. Additionally, combining multiple machine learning methods to optimize the accuracy of ET0 prediction and developing adaptive, transferable models suitable for different climates represents a promising avenue for future exploration.

3.6. Groundwater Level Prediction

Groundwater constitutes a vital component of the Earth’s water resources, sustaining ecosystems. Limited access to freshwater impacts around one third of the global population, and groundwater stocks are frequently depleted more rapidly than natural recharge processes [135]. The prediction of groundwater levels (GWLs) is crucial for sustainable water resource management, particularly in arid and semi-arid regions that heavily depend on groundwater [136,137,138]. Traditional methods for monitoring groundwater levels involve measurements using steel rules and sonic devices. However, in recent years, the use of machine learning has increasingly gained prominence as a promising alternative for predicting groundwater levels [139,140].

Iqbal et al. developed an ANN-based model using precipitation, temperature, solar radiation, and humidity to precisely predict groundwater levels in the Ravi and Satlerj river regions [141]. Natarajan et al. employed temperature and precipitation as features to predict groundwater levels in the Vizianagaram district. The study revealed that ELM exhibits higher accuracy compared to ANN, Genetic Programming (GP), and SVM [142]. Liu et al. devised an SVM-based model incorporating data assimilation technique, employing temperature, solar radiation, and precipitation as features to predict changes in the GWL over a 1–3-month time scale in 46 groundwater wells in the northeastern United States [143]. To overcome this limitation, Rahman et al. concentrated on temperature and rainfall characteristics. The author developed a machine learning model based on wavelet transform (WT-XGBL, WT-XGBT, and WT-RF). The performance of the model was compared with independent models (XGBL, XGBT, and RF), revealing that the WT method significantly enhanced GWL prediction [144]. Zhang et al. proposed a hybrid model that incorporated corrected WT. This model utilized precipitation, evapotranspiration, and temperature as features and applied the Shapley value approach to analyze the influence of different features on the ground-water level prediction model [136]. The Shapley value allocation concept serves as a useful indicator for measuring the comprehensive impact of multiple features on the results. Each feature’s Shapley value represents its contribution to the prediction results. In other words, changes in model accuracy caused by feature deletion or the addition of specific indicators form the basis for measuring their importance [145].

The studies demonstrate that precipitation, solar radiation, and temperature are commonly employed to study groundwater levels. The spatial integrity of groundwater level data is often neglected in the treatment of time series data. An irregular distribution of groundwater monitoring stations can lead to missing or abnormal data in certain areas. In future research, machine learning could be employed to estimate and rectify missing or abnormal data by integrating water level data with station data [139].

3.7. Water Quality Prediction

Water is one of the most valuable and crucial natural resources vital for human survival. However, the escalating emission of pollutants resulting from human activities has resulted in water pollution emerging as a significant problem, posing a threat to the ecosystem and the sustainable development of human society [146]. To safeguard ecological security and human health from water pollution, the close monitoring of water quality holds great significance for the better management and utilization of water resources [147]. Researchers have developed numerous methods for monitoring and evaluating water quality, including multivariate statistical methods, fuzzy reasoning, and the water quality index (WQI) [148]. Statistical-based water quality models are essentially normally and linearly distributed, assuming a correlation between predictive and response variables [149]. However, traditional data processing techniques are no longer sufficient to address the potential impact of multiple parameters on water quality. Generally speaking, machine learning does not consider internal processes; instead, it models through the correlation between input and output. Due to cost and time constraints, evaluating water quality based on a single parameter is often not feasible. WQI considers multiple parameters simultaneously and is typically a more suitable tool for evaluating water quality [150,151]. Mijares et al. proposed the first WQI model in 1965. Since then, scientists and environmental organizations have utilized WQI as a quantitative measure of water quality [152].

Most WQI models comprise four stages: (1) selecting water quality parameters, (2) determining parameter sub-indexes, (3) determining parameter weights, and (4) aggregating sub-indexes to calculate an overall WQI [153]. Uddin et al. employed water temperature, total organic nitrogen, ammonia, dissolved oxygen (DO), and pH from the coastal site of Cork Harbour as features to compare eight commonly used algorithms, including RF, DT, KNN, XGB, Extra Tree (ExT), SVM, LR, and Gaussian Naïve Bayes (GNB). The results indicated that the DT, ExT, and XGB models were effective and consistent in predicting WQI, significantly reducing model uncertainty [154]. Ding et al. suggested optimizing the WQI model by improving the values of parameter weights [155]. In 2022, Uddin et al. also used the XGB algorithm to predict WQI. The selected water quality features were salinity, water temperature, pH, transparency, and DO. They ranked and selected features based on their relative importance to the overall water quality condition, making the method more objective and the most outstanding model for predicting WQI [156]. In recent years, some scholars have applied deep learning to predict WQI. Talukdar et al. proposed a deep-learning-based stacked model for predicting WQI by integrating three models: Gradient Boosting Machine (GBM), Generalized Linear Model (GLM), and ANN. They used pH, temperature, total dissolved solids (TDS), DO, and biochemical oxygen demand (BOD) as features. The stacked model was used to analyze the sensitivity and uncertainty of the parameters affecting WQI. Validation results showed that the stacked model was able to accurately predict the WQI even with limited data [157]. Brester et al. utilized temperature, pH, and the concentrations of copper and iron ions collected from the pilot drinking water distribution system as features and developed an RF-based model to predict the abundance of bacterial populations in real time [158]. Liu et al. developed a drinking water quality model that utilized characteristics such as pH, DO, conductivity, turbidity, and chemical oxygen demand to predict water quality [159]. By employing temperature, microbial abundance, and precipitation as features, Sokolova et al. evaluated the applicability of machine learning in predicting the concentration of Escherichia coli in the Götaälv river at the intake of the Gothenburg Drinking Water Treatment Plant. The results revealed that the RF and Tree-based Pipeline Optimization (TPOT) models exhibited better prediction results than the other models [160]. Salinity, pH, water temperature, DO, and microbial abundance are commonly selected as features to predict water quality. In the practical application of water quality prediction, data will be continuously updated, and the data distribution may gradually change as a result. Incremental learning algorithms employ online learning to continuously add new data to the model and update the parameters or structure, ensuring the accuracy and reliability of the model. This can be applied to water quality prediction in the future.

4. Discussion and Conclusions

Water resource modeling holds great significance for water resource management and sustainable development. As indicated in Table 3, the application of machine learning in water resource modeling has progressed steadily, offering novel ideas and methods to address intricate water resource challenges. For instance, in the context of rainfall prediction, factors including pressure, temperature, humidity, potential, latitude, and long-term wind speed can be analyzed to forecast the distribution of rainfall for a specific future timeframe, thereby establishing a decision-making foundation for flood control and drought resistance. Regarding runoff prediction, factors such as rainfall, evapotranspiration, temperature, and radiation can be analyzed to predict the distribution of runoff for a specific future period, thereby offering a decision-making basis for water resource scheduling. For water quality prediction, monitoring and prediction can be accomplished by analyzing salinity, pH, water temperature, DO, and microbial abundance. This helps identify characteristics and patterns related to changes in water quality.

This review examines the application of machine learning in key hydrological factors, including precipitation, flood, runoff, soil moisture, evapotranspiration, groundwater level, and water quality. These include mainly traditional statistical-based models (MLR and SVR), link-based models (ANN and deep learning), and tree-based models (DT and RF). While deep learning has achieved good results in many hydraulic factor prediction processes, its lack of interpretability hinders our understanding of the internal working mechanisms of hydraulic processes. This situation can be improved through interpretable models, such as RF, which can provide clear rules and decision boundaries, explaining why certain content is chosen. Additionally, these models can be effectively coupled with explanatory methods like Shapley Additive Explanations and Local Interpretable Model-agnostic Explanation, providing users with insights into the internal workings of the machine learning models [161,162].

While machine learning has made significant strides in water resource modeling, challenges still persist. Firstly, the accuracy and reliability of machine learning models depend on the quality and completeness of input data. In water resource prediction, diverse data sources, including meteorological, hydrological, and geographic information, may be utilized. However, variations in data quality and the presence of missing or abnormal data can adversely affect model training and prediction. Therefore, ensuring data quality and integrity is a crucial issue. Secondly, determining the required amount of data to establish a high-performance machine learning model is challenging; generally, using more data tends to improve prediction accuracy. However, some studies have shown that even datasets with multiple data points, in the form of time series or spatially varied data, have proven effective in achieving good prediction accuracy for central problems. Finally, the non-stationarity of hydrological sequences poses new challenges for research tasks like hydrological forecasting. Additionally, under non-stationarity state conditions, traditional machine learning methods cannot be directly applied. In the future, more exploration of physical mechanisms is needed, combined with machine learning methods and hydrological–meteorological teleconnection, to establish forecasting models for non-stationarity state conditions.

With technological advancements, the application of machine learning technology in water resource prediction will become more extensive and in-depth. By enhancing the integration of machine learning with other technologies, such as GIS, radar remote sensing, drones, etc., richer and more accurate data can be provided, further improving the accuracy of hydrological forecasting. Additionally, with the development of technologies like the Internet of Things and cloud computing, machine learning may achieve real-time prediction, enabling the prediction results to be quickly adjusted based on the latest data. This is crucial for water resource management, as real-time predictions can help decision-makers respond more promptly. Finally, reinforcement learning may play a greater role in water resource management, aiding decision-makers in developing optimal water resource scheduling strategies. This learning-based decision support system can automatically provide reasonable suggestions based on historical and real-time data.

Overall, the future development of machine learning in water resource prediction will be diverse and comprehensive. Multi-disciplinary cooperation and cross-disciplinary exchanges are necessary to promote the development of this field. In summary, the application of machine learning in water resource modeling is of significant research importance. It not only improves the accuracy of water resource modeling and optimizes water resource management but also contributes to the protection of water resources. Therefore, it is crucial to strengthen the application research of machine learning in water resource modeling and develop more scientific and effective technical solutions to address global water resource challenges.

Author Contributions

Conceptualization, Z.L.; writing—original draft preparation, Z.L., J.Z., X.Y. and Z.Z.; writing—review and editing, Z.L. and Y.L.; visualization, X.Y.; supervision, Z.L.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Joint Funds of the National Natural Science Foundation of China (U2243235), and the National Natural Science Foundation of China (61902323).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ACC (Accuracy)	AIC (Information Criterion)
ANFIS (Adaptive Neuro-Fuzzy Inference System)	ANN (Artificial Neural Network)
Advanced Very High-Resolution Radiometer (AVHRR)	AUC (Area Under Curve)
Automatic Encoding Decoding (AED)	BP (Backpropagation)
BBO (Biogeography-Based Optimization)	BIC (Bayesian Information Criterion)
BID (Bidirectional)	BOD (Biochemical Oxygen Demand)
Boruta (Boruta packing algorithm)	CNM (Complex Network Model)
CNN (Convolutional neural network)	CVOA (Coronavirus Optimization Algorithm)
DBI (Davies-Bouldin Index)	DEM (Digital Elevation Model)
DI (Dunn)	DO (Dissolved Oxygen)
DNN (Deep Neural Network)	DT (Decision Tree)
EEMD (Ensemble Empirical Modal Decomposition)	ELM (Extreme Learning Model)
ENKF (Ensemble Kalman Filter)	ET (Evaporation Volume)
ET0 (Reference Crop Evaporation)	ExT (Extra Tree)
FA (Firefly Algorithm)	FFBP (First-order Feed-Back Propagation)
FMI (Fowlkes and Mallows Index)	GA (Genetic Algorithm)
GBDT (Gradient Boosting Decision Tree)	GBM (Gradient Boosting Machine)
GLM (Generalized Linear Model)	GNB (Gaussian Naïve Bayes)
GP (Genetic Programming)	GRNN (General Regression Neural Network)
GRU (Gated Recurrent Unit)	GWL (Groundwater Level)
GWO (Gray Wolf Optimization)	GIS (Geographic Information System)
Gradient Tree Boosting (GTB)	GBDT (Gradient Boosting Decision Tree)
IWM (Integrated Watershed Management)	JC (Jaccard)
KNN (K-Nearest Neighbors)	LDA (Linear Discriminant Analysis)
LR (Logistic Regression)	LSTM (Long Short-Term Memory)
MAE (Mean Absolute Error)	MCC (Matthews Correlation Coefficient)
MIM (Morphological Inundation Models)	MLR (Multiple Linear Regression)
MRE (Mean Relative Error)	MSE (Mean Squared Error)
MVMD (Multivariate Variable Pattern Decomposition)	NMRSE (Normalized Root Mean Square Error)
NSE (Nash–Sutcliffe Efficiency Coefficient)	NDVI (Normalised Vegetation Index)
PSO (Particle Swarm Optimization)	QPE (Quantitative Precipitation Estimation)
RF (Randomforest)	RI (Rand Index)
RMSE (Root Mean Squared Error)	RNN (Recurrent Neural Network)
ROM (Reduced Order Model)	SMLR (Stepwise Multiple Linear Regression)
Sn (Sensitivity)	SOM (Self-Organizing Map)
SONN (Second-order Neural Network)	Sp (Specificity)
SVM (Support Vector Machine)	SVR (Support Vector Regression)
TCN (Time Convolutional Neural Networks)	TLBO (Teaching-Learning-Based Optimization)
TDS (Total Dissolved Solids)	TPOT (Tree-based Pipeline Optimization)
TRMM (Tropical Rainfall Measuring Mission)	WNB (Weighted Native Bayes)
WQI (Water Quality Index)	WT (Wavelet Transform)
XGB (Extreme Gradient Boosting)

References

Ren, J.; Zhao, D. Recent Advances in Reticular Chemistry for Clean Energy, Global Warming, and Water Shortage Solutions. Adv. Funct. Mater. 2023, 5, 2307778. [Google Scholar] [CrossRef]
Gharib, A.A.; Blumberg, J.; Manning, D.T.; Goemans, C.; Arabi, M. Assessment of vulnerability to water shortage in semi-arid river basins: The value of demand reduction and storage capacity. Sci. Total. Environ. 2023, 871, 161964. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Yang, P.; Zhang, S.; Wang, W.Y.; Cai, Y.; Hu, S. Evaluation of water resource carrying capacity in the middle reaches of the Yangtze River Basin using the variable fuzzy-based method. Environ. Sci. Pollut. Res. 2022, 30, 30572–30587. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Han, Y.; Liu, B.; Li, H.; Du, X.; Wang, Q.; Wang, X.; Zhu, X. Construction and application of a refined model for the optimal allocation of water resources—Taking Guantao County, China as an example. Ecol. Indic. 2023, 146, 109929. [Google Scholar] [CrossRef]
Li, Z.; Liu, H. Temporal and spatial variations of precipitation change from Southeast to Northwest China during the period 1961–2017. Water 2020, 12, 2622. [Google Scholar] [CrossRef]
Guo, Y.; Shen, Y. Agricultural water supply/demand changes under projected future climate change in the arid region of northwestern China. J. Hydrol. 2016, 540, 257–273. [Google Scholar] [CrossRef]
Zhang, Y.; Khan, S.U.; Swallow, B.; Liu, W.; Zhao, M. Coupling coordination analysis of China’s water resources utilization efficiency and economic development level. J. Clean. Prod. 2022, 373, 133874. [Google Scholar] [CrossRef]
Zhang, H.; Jin, G.; Yu, Y. Review of river basin water resource management in China. Water 2018, 10, 425. [Google Scholar] [CrossRef]
Lin, L.; Yang, H.; Xu, X. Effects of water pollution on human health and disease heterogeneity: A review. Front. Environ. Sci. 2022, 10, 880246. [Google Scholar] [CrossRef]
Makanda, K.; Nzama, S.; Kanyerere, T. Assessing the role of water resources protection practice for sustainable water resources management: A review. Water 2022, 14, 3153. [Google Scholar] [CrossRef]
Loucks, D.P. Sustainable water resources management. Water Int. 2000, 25, 3–10. [Google Scholar] [CrossRef]
Vorosmarty, C.J.; Green, P.; Salisbury, J.; Lammers, R.B. Global water resources: Vulnerability from climate change and population growth. Science 2000, 289, 284–288. [Google Scholar] [CrossRef] [PubMed]
Wu, B.; Zheng, Y.; Wu, X.; Tian, Y.; Han, F.; Liu, J.; Zheng, C.B. Optimizing water resources management in large river basins with integrated surface water-groundwater modeling: A surrogate-based approach. Water Resour. Res. 2015, 51, 2153–2173. [Google Scholar] [CrossRef]
Wang, J.; Shi, P.; Jiang, P.; Hu, J.; Qu, S.; Chen, X.; Chen, Y.; Dai, Y.; Xiao, Z. Application of BP neural network algorithm in traditional hydrological model for flood forecasting. Water 2017, 9, 48. [Google Scholar] [CrossRef]
Shen, H.; Tolson, B.A.; Mai, J. Time to update the split-sample approach in hydrological model calibration. Water Resour. Res. 2022, 58, e2021WR031523. [Google Scholar] [CrossRef]
Rani, K.S.; Kumari, M.; Singh, V.B.; Sharma, M. Deep learning with big data: An emerging trend. In Proceedings of the 2019 19th International Conference on Computational Science and Its Applications (ICCSA), Saint Petersburg, Russia, July 1–4 2019; pp. 93–101. [Google Scholar]
Anjum, R.; Parvin, F.; Ali, S.A. Machine Learning Applications in Sustainable Water Resource Management: A Systematic Review. In Emerging Technologies for Water Supply, Conservation and Management Springer Water; Springer: Cham, Switzerland, 2023; pp. 29–47. [Google Scholar]
Mekanik, F.; Imteaz, M.A.; Gato-Trinidad, S.; Elmahdi, A. Multiple regression and artificial neural network for long-term rainfall forecasting using large scale climate modes. J. Hydrol. 2013, 503, 11–21. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, F. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
Uc-Castillo, J.L.; Marín-Celestino, A.E.; Martínez-Cruz, D.A.; Tuxpan-Vargas, J.; Ramos-Leal, J.A. A systematic review and meta-analysis of groundwater level forecasting with machine learning techniques: Current status and future directions. Environ. Model. Softw. 2023, 168, 5788. [Google Scholar] [CrossRef]
Arrighi, C.; Castelli, F. Prediction of ecological status of surface water bodies with supervised machine learning classifiers. Sci. Total Environ. 2023, 857, 159655. [Google Scholar] [CrossRef]
Ghobadi, F.; Kang, D. Application of Machine Learning in Water Resources Management: A Systematic Literature Review. Water 2023, 15, 620. [Google Scholar] [CrossRef]
Mosaffa, H.; Sadeghi, M.; Mallakpour, I.; Jahromi, M.N.; Pourghasemi, H.R. Chapter 43-Application of machine learning algorithms in hydrology. Comput. Earth Environ. Sci. 2022, 2–3, 585–591. [Google Scholar]
Zounemat-Kermani, M.; Batelaan, O.; Fadaee, M.; Hinkelmann, R. Ensemble machine learning paradigms in hydrology: A review. J. Hydrol. 2021, 598, 126266. [Google Scholar] [CrossRef]
Başağaoğlu, H.; Chakraborty, D.; Lago, C.D.; Gutierrez, L.; Şahinli, M.A.; Giacomoni, M.; Furl, C.; Mirchi, A.; Moriasi, D.; Şengör, S.S. A Review on Interpretable and Explainable Artificial Intelligence in Hydroclimatic Applications. Water 2022, 14, 1230. [Google Scholar] [CrossRef]
Collins, C.; Dennehy, D.; Conboy, K.; Mikalef, P. Artificial Intelligence in Information Systems Research: A Systematic Literature Review and Research Agenda. Int. J. Inf. Manag. 2021, 60, 102383. [Google Scholar] [CrossRef]
Tufail, S.; Riggs, H.; Tariq, M.; Sarwat, A.I. Advancements and Challenges in Machine Learning: A Comprehensive Review of Models, Libraries, Applications, and Algorithms. Electronics 2023, 12, 1789. [Google Scholar] [CrossRef]
Jain, A.K.; Murty, M.N.; Flynn, P. Data Clustering: A Review. ACM Comput. Surv. 1999, 31, 264–323. [Google Scholar] [CrossRef]
Nasteski, V. An Overview of the Supervised Machine Learning Methods. Horizons. B. 2017, 4, 51–62. [Google Scholar] [CrossRef]
Frénay, B.; Verleysen, M. Classification in the Presence of Label Noise: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 845–869. [Google Scholar] [CrossRef] [PubMed]
Hu, Y. Identification and Estimation of Nonlinear Models with Misclassification Error Using Instrumental Variables: A General Solution. J. Econom. 2008, 144, 27–61. [Google Scholar] [CrossRef]
Turkyilmazoglu, M. Nonlinear Problems via a Convergence Accelerated Decomposition Method of Adomian. Comput. Model. Eng. Sci. 2021, 127, 1. [Google Scholar] [CrossRef]
Raschka, S. Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning. arXiv 2018, arXiv:1811.12808. [Google Scholar]
Zhou, Z.H. Machine Learning, 1st ed.; Tsinghua University Press: Beijing, China, 2016. [Google Scholar]
Yadav, S.; Shukla, S. Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification. In Proceedings of the 2016 IEEE 6th International Conference on Advanced Computing (IACC), Bhimavaram, India, 27–28 February 2016; pp. 78–83. [Google Scholar]
Rahman, A.-U.; Abbas, S.; Gollapalli, M.; Ahmed, R.; Aftab, S.; Ahmad, M.; Khan, M.A.; Mosavi, A. Rainfall Prediction System Using Machine Learning Fusion for Smart Cities. Sensors 2022, 22, 3504. [Google Scholar] [CrossRef] [PubMed]
Salaeh, N.S.; Ditthakit, S.; Pinthong, M.A.; Islam, S.M.; Mohammadi, B.; Linh, N.T. Long-Short Term Memory Technique for Monthly Rainfall Prediction in Thale Sap Songkhla River Basin, Thailand. Symmetry 2022, 14, 1599. [Google Scholar] [CrossRef]
Hu, Y.; Wu, J. Analysis and prediction of spatial characteristics of precipitation based on ARIMA model. Jiangxi Sci. 2021, 39, 99–104. [Google Scholar]
Wang, B.; Liang, X.J.; Zhang, H.; Wang, L.; Wei, Y. Vulnerability of hydropower generation to climate change in China: Results based on grey forecasting model. Energy Policy 2014, 65, 701–707. [Google Scholar] [CrossRef]
Gui, Y.; Shao, J. Prediction of precipitation based on weighted Markov chain in Dangshan. In Proceedings of the International Conference on High Performance Compilation. Computing and Communications, Kuala Lumpur, Malaysia, 22–24 March 2017; pp. 81–85. [Google Scholar]
Dimri, A.P.; Joshi, P.; Ganju, A. Precipitation forecast over western Himalayas using k-nearest neighbour method. Int. J. Climatol. J. R. Meteorol. Soc. 2008, 28, 1921–1931. [Google Scholar] [CrossRef]
Ghazvinian, H.; Bahrami, H.; Ghazvinian, H.; Heddam, S. Simulation of monthly precipitation in Semnan city using ANN artificial intelligence model. J. Soft Comput. Civ. Eng. 2020, 4, 36–46. [Google Scholar]
Wolfensberger, D.; Gabella, M.; Boscacci, M.; Germann, U.; Berne, A. Rainforest: A random forest algorithm for quantitative precipitation estimation over Switzerland. Atmos. Meas. Tech. 2021, 14, 3169–3193. [Google Scholar] [CrossRef]
Umirbekov, A.; Peña-Guerrero, M.D.; Müller, D. Regionalization of climate teleconnections across central Asian mountains improves the predictability of seasonal precipitation. Environ. Res. Lett. 2022, 17, 055002. [Google Scholar] [CrossRef]
Kumar, D.; Singh, A.; Samui, P.; Jha, R.K. Forecasting monthly precipitation using sequential modelling. Hydrol. Sci. J. 2019, 64, 690–700. [Google Scholar] [CrossRef]
Shen, Z.Y.; Ban, W.C. Machine learning model combined with CEEMDAN algorithm for monthly precipitation prediction. Earth Sci. Inform. 2023, 16, 1821–1833. [Google Scholar] [CrossRef]
Zhang, X.Q.; Zhao, D.; Wang, T.; Wu, X.L.; Duan, B.S. A novel rainfall prediction model based on CEEMDAN-PSO-ELM coupled model. Water Supply 2022, 22, 4531–4543. [Google Scholar] [CrossRef]
Bi, K.; Xie, L.; Zhang, H.; Chen, X.; Gu, X.; Tian, Q. Accurate medium-range global weather forecasting with 3D neural networks. Nature 2023, 619, 533–538. [Google Scholar] [CrossRef] [PubMed]
Hao, G.; Li, J.; Song, L.; Li, H.E.; Li, Z.L. Comparison between the TOPMODEL and the Xin’anjiang model and their application to rainfall runoff simulation in semi-humid regions. Environ. Earth Sci. 2018, 77, 279. [Google Scholar] [CrossRef]
Le, X.H.; Ho, H.V.; Lee, G.; Jung, S. Application of Long Short-Term Memory (LSTM)Neural Network for Flood Forecasting. Water 2019, 11, 1387. [Google Scholar] [CrossRef]
Rezaeianzadeh, M.; Tabari, H.; Arabi Yazdi, A.; Lsik, S.; Kalin, L. Flood flow forecasting using ANN, ANFIS and regression models. Neural Comput. Appl. 2014, 25, 25–37. [Google Scholar] [CrossRef]
Bai, Y.; Zhao, Y.; Shao, Y.; Zhang, X.F.; Yuan, X.F. Deep learning in different remote sensing image categories and applications: Status and prospects. Int. J. Remote Sens. 2022, 43, 1800–1847. [Google Scholar] [CrossRef]
Xu, J.H.; Lin, Z.M. Application of multiple linear regression method in short-term flood forecasting. J. China Hydrol. 1981, 6, 5–8. [Google Scholar]
Zhang, X.Y.; Dong, Z.C.; Wang, J.Q.; Zhao, J.X. Method of flood forecasting for Niyang River Basin of Yarlungzangbo River. J. Hohai Univ. 2005, 5, 530–533. [Google Scholar]
Latt, Z.Z.; Wittenberg, H. Improving flood forecasting in a developing country: A comparative study of stepwise multiple linear regression and artificial neural network. Water Resour. Manag. 2014, 28, 2109–2128. [Google Scholar] [CrossRef]
Jain, S.K.; Das, A.; Srivastava, D.K. Application of ANN for reservoir inflow prediction and operation. J. Water Resour. Plan. Manag. 1999, 125, 263–271. [Google Scholar] [CrossRef]
Hsu, K.L.; Gupta, H.V.; Sorooshian, S. Artificial neural network modeling of the rainfall: Runoff process. Water Resour. Res. 1995, 31, 2517–2530. [Google Scholar] [CrossRef]
Liong, S.Y.; Sivapragasam, C. Flood stage forecasting with support vector machines. JAWRA J. Am. Water Resour. Assoc. 2007, 38, 173–186. [Google Scholar] [CrossRef]
Nguyen, D.T.; Chen, S.T. Real-time probabilistic flood forecasting using multiple machine learning methods. Water 2020, 12, 787. [Google Scholar] [CrossRef]
Lohani, A.K.; Kumar, R.; Singh, R.D. Hydrological time series modeling: A comparison between adaptive neuro-fuzzy, neural network and autoregressive techniques. J. Hydrol. 2012, 442–443, 23–35. [Google Scholar] [CrossRef]
Li, B.; Yang, G.; Wan, R.; Dai, X.; Zhang, Y.H. Comparison of random forests and other statistical methods for the prediction of lake water level: A case study of the Poyang Lake in China. Hydrol. Res. 2016, 47, 69–83. [Google Scholar] [CrossRef]
Kabir, S.R.; Patidar, S.; Xia, X. A deep convolutional neural network model for rapid prediction of fluvial flood inundation. J. Hydrol. 2020, 590, 125481. [Google Scholar] [CrossRef]
Hosseiny, H. A deep learning model for predicting river flood depth and extent. Environ. Model. Softw. 2021, 145, 105186. [Google Scholar] [CrossRef]
Hu, R.; Fang, F.; Pain, C.C. Rapid spatio-temporal flood prediction and uncertainty quantification using a deep learning method. J. Hydrol. 2019, 575, 911–920. [Google Scholar] [CrossRef]
Moshe, Z.; Metzger, A.; Elidan, G.; Kratzert, F. HydroNets: Leveraging River Structure for Hydrologic Modeling. arXiv 2020, arXiv:2007.00595. [Google Scholar]
Shu, Z.K.; Li, W.X.; Zhang, J.Y.; Jin, J.L.; Xue, Q.; Wang, Y.T.; Wang, G.Q. Historical changes and future trends of extreme precipitation and high temperature in China. Strateg. Study CAE 2022, 24, 116–125. [Google Scholar] [CrossRef]
Zhang, W.; Li, S.M.; Shi, Z.N. Causes and countermeasures of urban rainstorm waterlogging in China. J. Nat. Disasters 2012, 21, 180–184. [Google Scholar]
Xia, J.; Wang, H.J.; Gan, Y.Y.; Zhang, L.P. Research progress in forecasting methods of rainstorm and flood disaster in China. Torrential Rain Disasters 2019, 38, 416–421. [Google Scholar]
Krupka, M. A Rapid Inundation Flood Cell Model for Flood Risk Analysis; Heriot-Watt University: Edinburgh, UK, 2009. [Google Scholar]
Zhang, S.; Pan, B. An urban storm-inundation simulation method based on GIS. J. Hydrol. 2014, 517, 260–268. [Google Scholar] [CrossRef]
Huang, G.R.; Wang, X.; Huang, W. Simulation of rainstorm water logging in urban area based on InfoWorks ICM Model. Water Resour. Power 2017, 35, 66–70. [Google Scholar]
Zeng, Z.Y.; Wang, Z.L.; Wu, X.S.; Lai, C.G.; Chen, X.H. Rainstorm waterlogging simulations based on SWMM and LISFLOOD models. J. Hydroelectr. Eng. 2017, 36, 68–77. [Google Scholar]
Leitão, J.P.; Simões, N.E.; Simões, N.E.; Maksimović, Č.; Ferreira, F.; Prodanović, D.; Matos, J.S.; Marques, A. Real-time forecasting urban drainage models: Full or simplified networks? Water Sci. Technol. 2010, 62, 2106–2114. [Google Scholar] [CrossRef]
Guo, Z.; Leitao, J.P.; Simoes, N.E.; Moosavi, V. Data-driven flood emulation: Speeding up urban flood predictions by deep convolutional neural networks. J. Flood Risk Manag. 2020, 14, 12684. [Google Scholar] [CrossRef]
Zheng, S.S.; Wan, Q.; Jia, M.Y. Short-term forecasting of waterlogging at urban storm-waterlogging monitoring sites based on STARMA model. Prog. Geogr. 2014, 33, 949–957. [Google Scholar]
Lai, W.; Wang, H.; Wang, C.; Zhang, J.; Zhao, Y. Waterlogging risk assessment based on self-organizing map (SOM) artificial neural networks: A case study of an urban storm in Beijing. J. Mt. Sci. 2017, 14, 898–905. [Google Scholar] [CrossRef]
Yan, J.; Jin, J.M.; Chen, F.R.; Yu, G.; Yin, H.L.; Wang, W.J. Urban flash flood forecast using support vector machine and numerical simulation. J. Hydroinformatics 2018, 20, 221–231. [Google Scholar] [CrossRef]
Wu, Z.; Zhou, Y.H.; Wang, H.L. Real-time prediction of the water accumulation process of urban stormy accumulation points based on deep learning. IEEE Access 2020, 8, 151938–151951. [Google Scholar] [CrossRef]
Li, H.H.; Wu, J.D.; Wang, Q.; Yang, C.; Pan, S. A study on rain storm waterlogging disater prediction models in ShangHai based on machine learning. J. Nat. Disaters 2021, 30, 191–200. [Google Scholar]
Wang, M.; Fu, X.; Zhang, D.; Lou, S.W.; Li, J.J.; Chen, F.R.; Li, S.; Tan, S.K. Urban agglomeration waterlogging hazard exposure assessment based on an integrated Naive Bayes classifier and complex network analysis. Nat. Hazards 2023, 118, 2173–2197. [Google Scholar] [CrossRef]
Kim, H.I.; Han, K.Y. Data-driven approach for the rapid simulation of urban flood prediction. KSCE J. Civ. Eng. 2020, 24, 1932–1943. [Google Scholar] [CrossRef]
Hou, J.M.; Zhou, N.; Chen, G.; Huang, M.S.; Bai, G.B. Rapid forecasting of urban flood inundation using multiple machine learning models. Nat. Hazards 2021, 108, 2335–2356. [Google Scholar] [CrossRef]
Milly, P.; Dunne, K.; Vecchia, A. Global pattern of trends in streamflow and water availability in a changing climate. Nature 2005, 438, 347–350. [Google Scholar] [CrossRef] [PubMed]
Kratzert, F.; Klotz, D.; Herrnegger, M.; Sampson, A.K.; Hochreiter, S.; Nearing, G.S. Toward improved predictions in ungauged basins: Exploiting the power of machine learning. Water Resour. Res. 2019, 55, 11344–11354. [Google Scholar] [CrossRef]
Fang, J.J.; Yang, L.; Wen, X.H.; Li, W.; Yu, H.; Zhou, T. A deep learning-based hybrid approach for multi-time-ahead streamflow prediction in an arid region of Northwest China. Hydrol. Res. 2024, nh2024124. [Google Scholar] [CrossRef]
Liu, J.; Ren, K.; Ming, K.; Qu, J.; Guo, W.; Li, H. Investigating the efects of local weather, streamfow lag, and global climate information on 1-month-ahead streamfow forecasting by using XGBoost and SHAP: Two case studies involving the contiguous USA. Acta Geophys. 2023, 71, 905–925. [Google Scholar] [CrossRef]
Yao, Z.; Wang, Z.; Wang, D.; Wu, J.; Chen, L. An ensemble CNN-LSTM and GRU adaptive weighting model based improved sparrow search algorithm for predicting runoff using historical meteorological and runoff data as input. J. Hydrol. 2023, 625, 129977. [Google Scholar] [CrossRef]
Liu, Y.; Wang, L.; Yang, L.; Liu, X.; Wang, L. Runoff Prediction and Analysis Based on Improved CEEMDAN-OS-QR-ELM. IEEE Access 2021, 9, 57311–57324. [Google Scholar] [CrossRef]
Yang, M.X.; Wang, H.; Jiang, Y.; Lu, X.; Xu, Z.; Sun, G. GECA Proposed Ensemble–KNN Method for Improved Monthly Runoff Forecasting. Water Resour. Manag. 2020, 34, 849–863. [Google Scholar] [CrossRef]
Ghumman, A.R.; Ghazaw, Y.M.; Sohail, A.R.; Watanabe, K. Runoff forecasting by artificial neural network and conventional model. Alex. Eng. J. 2011, 50, 345–350. [Google Scholar] [CrossRef]
Tan, Q.F.; Lei, X.H.; Wang, X.; Wang, H.; Wen, X.; Ji, Y.; Kang, A.Q. An adaptive middle and long-term runoff forecast model using EEMD-ANN hybrid approach. J. Hydrol. 2018, 567, 767–780. [Google Scholar] [CrossRef]
Liao, S.L.; Wang, H.; Liu, B.X.; Ma, X.; Zhou, B.; Su, H. Runoff Forecast Model Based on an EEMD-ANN and Meteorological Factors Using a Multicore Parallel Algorithm. Water Resour. Manag. 2023, 37, 1539–1555. [Google Scholar] [CrossRef]
Han, D.Y.; Liu, P.; Xie, K.; Li, H.; Xia, Q.; Cheng, Q.; Xia, J. An attention-based LSTM model for long-term runoff forecasting and factor recognition. Environ. Res. Lett. 2023, 18, 024004. [Google Scholar] [CrossRef]
Zealand, C.M.; Burn, D.H.; Simonovic, S.P. Short term streamflow forecasting using artificial neural networks. J. Hydrol. 1999, 214, 32–48. [Google Scholar] [CrossRef]
Hu, C.H.; Wu, Q.; Li, H.; Jian, S.; Li, N.; Lou, Z. Deep Learning with a Long Short-Term Memory Networks Approach for Rainfall-Runoff Simulation. Water 2018, 10, 1543. [Google Scholar] [CrossRef]
Gao, S.; Huang, Y.; Zhang, S.; Han, J.; Wang, G.; Zhang, M.; Lin, Q. Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation. J. Hydrol. 2020, 589, 125188. [Google Scholar] [CrossRef]
Naganna, S.R.; Sreedhara, B.; Muttana, S.; Marulasiddappa, S.B.; Balreddy, M.S.; Yaseen, Z.M. Daily scale streamflow forecasting in multiple stream orders of Cauvery River, India: Application of advanced ensemble and deep learning models. J. Hydrol. 2023, 626, 130320. [Google Scholar] [CrossRef]
Zhang, Y.; Zheng, H.; Zhang, X.; Leung, L.R.; Liu, C.; Zheng, C.; Blöschl, G. Future global streamflow declines are probably more severe than previously estimated. Nat. Water 2023, 1, 261–271. [Google Scholar] [CrossRef]
Seneviratne, S.I.; Corti, T.; Davin, E.L.; Hirschi, M.; Jaeger, E.B.; Lehner, I.; Teuling, A.J. Investigating soil moisture–climate interactions in a changing climate: A review. Earth Sci. Rev. 2010, 99, 125–161. [Google Scholar] [CrossRef]
Chakraborty, D.; Başağaoğlu, H.; Alian, S.; Mirchi, A.; Moriasi, D.N.; Starks, P.J.; Verser, J.A. Multiscale extrapolative learning algorithm for predictive soil moisture modeling & applications. Expert Syst. Appl. 2023, 213, 119056. [Google Scholar]
Cai, Y.; Zheng, W.G.; Zhang, X.; Zhangzhong, L.; Xue, X. Research on soil moisture prediction model based on deep learning. PLoS ONE 2019, 14, e0214508. [Google Scholar] [CrossRef] [PubMed]
Western, A.W.; Grayson, R.B.; Blöschl, G. Scaling of Soil Moisture: A Hydrologic Perspective. Annu. Rev. Earth Planet. Sci. 2002, 30, 149–180. [Google Scholar] [CrossRef]
Freeze, R.A.; Harlan, R.L. Blueprint for a physically-based, digitally-simulated hydrologic response model. J. Hydrol. 1969, 9, 237–258. [Google Scholar] [CrossRef]
Elshorbagy, A.; Corzo, G.; Srinivasulu, S.; Solomatine, D.P. Experimental investigation of the predictive capabilities of data-driven modeling techniques in hydrology—Part 1: Concepts and methodology. Hydrol. Earth Syst. Sci. 2010, 14, 1931–1941. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Chai, S.S.; Walker, J.P.; Makarynskyy, O.; Kuhn, M.; Veenendaal, B.; West, G. Use of Soil Moisture Variability in Artificial Neural Network Retrieval of Soil Moisture. Remote Sens. 2009, 2, 166–190. [Google Scholar] [CrossRef]
Ahmad, S.; Kalra, A.; Stephen, H. Estimating soil moisture using remote sensing data: A machine learning approach. Adv. Water Resour. 2010, 33, 69–80. [Google Scholar] [CrossRef]
Das, N.N.; Mohanty, B.P. Root Zone Soil Moisture Assessment Using Remote Sensing and Vadose Zone Modeling. Vadose Zone J. 2006, 5, 296–307. [Google Scholar] [CrossRef]
Kumar, S.V.; Reichle, R.H.; Peters-Lidard, C.D.; Koster, R.D.; Zhan, X.; Crow, W.T.; Houser, P.R. A land surface data assimilation framework using the land information system: Description and applications. Adv. Water Resour. 2008, 31, 1419–1432. [Google Scholar] [CrossRef]
Chen, W.; Huang, C.; Shen, H.; Li, X. Comparison of ensemble-based state and parameter estimation methods for soil moisture data assimilation. Adv. Water Resour. 2015, 86, 425–438. [Google Scholar] [CrossRef]
Liu, D.; Yu, Z.B.; Liu, H.S. Data assimilation using support vector machines and ensemble Kalman filter for multi-layer soil moisture prediction. Water Sci. Eng. 2010, 3, 361–377. [Google Scholar]
Elsaadani, M.; Habib, E.; Abdelhameed, A.M.; Bayoumi, M. Assessment of a Spatiotemporal Deep Learning Approach for Soil Moisture Prediction and Filling the Gaps in Between Soil Moisture Observations. Front. Artif. Intell. 2021, 4, 636234. [Google Scholar] [CrossRef] [PubMed]
Gao, P.; Qiu, H.; Lan, Y.; Wang, W.; Chen, W.; Han, X.; Lu, J. Modeling for the Prediction of Soil Moisture in Litchi Orchard with Deep Long Short-Term Memory. Agriculture 2022, 12, 25. [Google Scholar] [CrossRef]
Datta, P.K.; Salah, F.A. A Multihead LSTM Technique for Prognostic Prediction of Soil Moisture. Geoderma 2023, 433, 116452. [Google Scholar] [CrossRef]
Adamowski, J.; Fung Chan, H.; Prasher, S.O.; Ozga-Zielinski, B.; Sliusarieva, A. Comparison of Multiple Linear and Nonlinear Regression, Autoregressive Integrated Moving Average, Artificial Neural Network, and Wavelet Artificial Neural Network Methods for Urban Water Demand Forecasting in Montreal, Canada. Water Resour. Res. 2012, 48, 273–279. [Google Scholar] [CrossRef]
Prasad, R.; Ravinesh, C.L.; Tek, Y.M. Weekly Soil Moisture Forecasting with Multivariate Sequential, Ensemble Empirical Mode Decomposition and Boruta-Random Forest Hybridizer Algorithm Approach. CATENA 2019, 177, 149–166. [Google Scholar] [CrossRef]
Kim, H.; Choi, M. An inter-comparison of active and passive satellite soil moisture products in east asia for dust-outbreak prediction. J. Korean Soc. Hazard Mitig. 2015, 15, 53–58. [Google Scholar] [CrossRef]
Jamei, M.; Ali, M.; Karbasi, M.; Sharma, E.; Jamei, M.; Chu, X.; Yaseen, Z.M. A High Dimensional Features-Based Cascaded Forward Neural Network Coupled with MVMD and Boruta-GBDT for Multi-step Ahead Forecasting of Surface Soil Moisture. Eng. Appl. Artif. Intell. 2023, 120, 105895. [Google Scholar] [CrossRef]
Li, T.S.; Xia, J.; Zhang, L.; She, D.; Wang, G.; Cheng, L. An Improved Complementary Relationship for Estimating Evapotranspiration Attributed to Climate Change and Revegetation in the Loess Plateau, China. J. Hydrol. 2021, 592, 125516. [Google Scholar] [CrossRef]
Chakraborty, D.; Başağaoğlu, H.; Winterle, J. Interpretable vs. noninterpretable machine learning models for data-driven hydro-climatological process modeling. Expert Syst. Appl. 2021, 170, 114498. [Google Scholar] [CrossRef]
Ozgur, K. Modeling Reference Evapotranspiration Using Three Different Heuristic Regression Approaches. Agric. Water Manag. 2016, 169, 162–172. [Google Scholar]
Kumar, M.; Raghuwanshi, N.S.; Singh, R.; Wallender, W.W.; Pruitt, W.O. Estimating Evapotranspiration Using Artificial Neural Network. J. Irrig. Drain. Eng. 2002, 128, 224–233. [Google Scholar] [CrossRef]
Adamala, S.; Raghuwanshi, N.S.; Mishra, A.; Tiwari, M.K. Evapotranspiration Modeling Using Second-Order Neural Networks. J. Hydrol. Eng. 2014, 19, 1131–1140. [Google Scholar] [CrossRef]
Antonopoulos, V.Z.; Antonopoulos, A.V. Daily Reference Evapotranspiration Estimates by Artificial Neural Networks Technique and Empirical Equations Using Limited Input Climate Variables. Comput. Electron. Agric. 2017, 132, 86–96. [Google Scholar] [CrossRef]
Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine Learning Algorithms for Modeling Groundwater Level Changes in Agricultural Regions of the U.S. Water Resour. Res. 2017, 53, 3878–3895. [Google Scholar] [CrossRef]
Rangapuram, S.S.; Seeger, M.W.; Gasthaus, J.; Stella, L.; Wang, Y.; Januschowski, T. Deep State Space Models for Time Series Forecasting. Adv. Neural Inf. Process. Syst. 2018, 31, 7796–7805. [Google Scholar]
Chen, Z.; Zhu, Z.; Jiang, H.; Sun, S. Estimating Daily Reference Evapotranspiration Based on Limited Meteorological Data Using Deep Learning and Classical Machine Learning Methods. J. Hydrol. 2020, 591, 125286. [Google Scholar] [CrossRef]
Karbasi, M.; Jamei, M.; Ali, M.; Malik, A.; Yaseen, Z.M. Forecasting Weekly Reference Evapotranspiration Using Auto Encoder Decoder Bidirectional LSTM Model Hybridized with a Boruta-CatBoost Input Optimizer. Comput. Electron. Agric. 2022, 198, 107121. [Google Scholar] [CrossRef]
Adnan, R.M.; Mostafa, R.R.; Islam, A.; Islam, A.R.M.T.; Kisi, O.; Kuriqi, A.; Heddam, S. Estimating Reference Evapotranspiration Using Hybrid Adaptive Fuzzy Inferencing Coupled with Heuristic Algorithms. Comput. Electron. Agric. 2021, 191, 106541. [Google Scholar] [CrossRef]
Alizamir, M.; Kisi, O.; Adnan, R.M.; Muhammad Adnan, R.; Kuriqi, A. Modelling Reference Evapotranspiration by Combining Neuro-Fuzzy and Evolutionary Strategies. Acta Geophys. 2020, 68, 1113–1126. [Google Scholar] [CrossRef]
Maroufpoor, O.M.E. Reference Evapotranspiration Estimating Based on Optimal Input Combination and Hybrid Artificial Intelligent Model: Hybridization of Artificial Neural Network with Grey Wolf Optimizer Algorithm. J. Hydrol. 2020, 588, 125060. [Google Scholar] [CrossRef]
Roy, D.K.; Barzegar, R.; Quilty, J.; Adamowski, J. Using Ensembles of Adaptive Neuro-Fuzzy Inference System and Optimization Algorithms to Predict Reference Evapotranspiration in Subtropical Climatic Zones. J. Hydrol. 2020, 591, 125509. [Google Scholar] [CrossRef]
Troncoso-García, A.R.; Brito, I.S.; Troncoso, A.; Martínez-Álvarez, F. Explainable Hybrid Deep Learning and Coronavirus Optimization Algorithm for Improving Evapotranspiration Forecasting. Comput. Electron. Agric. 2023, 215, 108387. [Google Scholar] [CrossRef]
Başağaoğlu, H.; Chakraborty, D.; Winterle, J. Reliable Evapotranspiration Predictions with a Probabilistic Machine Learning Framework. Water 2021, 13, 557. [Google Scholar] [CrossRef]
Hydrology: Groundwater stores running dry. Nature 2010, 467, 636. [CrossRef]
Zhang, Q.; Li, P.; Ren, X.; Ning, J.; Li, J.H.; Liu, C.S.; Wang, Y.; Wang, G.Q. A new real-time groundwater level forecasting strategy: Coupling hybrid data-driven models with remote sensing data. J. Hydrol. 2023, 625, 129962. [Google Scholar] [CrossRef]
Liu, Q.; Gui, D.; Zhang, L.; Niu, J.; Dai, H.; Wei, G.; Hu, B. Simulation of regional groundwater levels in arid regions using interpretable machine learning models. Sci. Total Environ. 2022, 831, 154902. [Google Scholar] [CrossRef] [PubMed]
Niu, X.; Lu, C.; Zhang, Y.; Wu, C.; Ebrima, S.; Liu, B.; Shu, L. Hysteresis response of groundwater depth on the influencing factors using an explainable learning model framework with Shapley values. Sci. Total Environ. 2023, 904, 166662. [Google Scholar] [CrossRef] [PubMed]
Aderemi, B.A.; Olwal, T.O.; Ndambuki, J.M.; Rwanga, S.S. Groundwater levels forecasting using machine learning models: A case study of the groundwater region 10 at Karst Belt, South Africa. Syst. Soft Comput. 2023, 5, 200049. [Google Scholar] [CrossRef]
Sun, A.Y.; Scanlon, B.R. How can big data and machine learning benefit environment and water management: A survey of methods, applications, and future directions. Environ. Res. Lett. 2019, 14, 073001. [Google Scholar] [CrossRef]
Iqbal, M.; Naeem, A.U.; Ahmad, A.; Rehman, H.; Ghani, U.; Farid, T. Relating groundwater levels with meteorological parameters using ANN technique. Measurement 2020, 166, 108163. [Google Scholar] [CrossRef]
Natarajan, N.; Sudheer, C. Groundwater level forecasting using soft computing techniques. Neural Comput. Applic. 2020, 32, 7691–7708. [Google Scholar] [CrossRef]
Liu, D.; Mishra, A.K.; Yu, Z.B.; Lv, H.; Li, Y. Support vector machine and data assimilation framework for Groundwater Level Forecasting using GRACE satellite data. J. Hydrol. 2021, 603, 126929. [Google Scholar] [CrossRef]
Rahman, A.T.M.S.; Hosono, T.; John, M.Q.; Das, J.; Basak, A. Multiscale groundwater level forecasting: Coupling new machine learning approaches with wavelet transforms. Adv. Water Resour. 2020, 141, 103595. [Google Scholar] [CrossRef]
Aydin, H.E.; Iban, M.C. Predicting and analyzing food susceptibility using boosting-based ensemble machine learning algorithms with SHapley Additive exPlanations. Nat. Hazards 2023, 116, 2957–2991. [Google Scholar] [CrossRef]
Virro, H.; Kmoch, A.; Vainu, M.; Uuemaa, E. Random forest-based modeling of stream nutrients at national level in a data-scarce region. Sci. Total Environ. 2022, 840, 156613. [Google Scholar] [CrossRef]
Huang, R.X.; Ma, C.X.; Ma, J.; Huangfu, L.; He, Q. Machine learning in natural and engineered water systems. Water Res. 2021, 205, 117666. [Google Scholar] [CrossRef] [PubMed]
Zhu, M.Y.; Wang, J.W.; Yang, X.; Zhang, Y.; Zhang, L.Y.; Ren, H.Q.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef] [PubMed]
Ahmed, A.N.; Othman, F.B.; Afan, H.A.; Ibrahim, R.K.; Fai, C.M.; Hossain, M.S.; Ehteram, M.; Elshafie, A. Machine learning methods for better water quality prediction. J. Hydrol. 2019, 578, 124084. [Google Scholar] [CrossRef]
Noori, R.; Berndtsson, R.; Hosseinzadeh, M.; Adamowski, J.F.; Abyaneh, M.R. A critical review on the application of the National Sanitation Foundation Water Quality Index. Environ. Pollut. 2019, 244, 575–587. [Google Scholar] [CrossRef] [PubMed]
Abbasi, T.; Abbasi, S.A. Water Quality Indices; Elsevier: Amsterdam, The Netherlands, 2012; ISBN 978-0-444-54304-2. [Google Scholar]
Mijares, V.; Gitau, M.; Johnson, D.R. A Method for Assessing and Predicting Water Quality Status for Improved Decision-Making and Management. Water Resour Manag. 2019, 33, 509–522. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Olbert, A.I. A review of water quality index models and their use for assessing surface water quality. Ecol. Indic. 2021, 122, 107218. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Diganta, M.T.M.; Rahman, A.; Olbert, A.I. Robust machine learning algorithms for predicting coastal water quality index. J. Environ. Manag. 2022, 321, 115923. [Google Scholar] [CrossRef]
Ding, F.; Zhang, W.J.; Chen, L.Y.; Sun, Z.G.; Li, W.P.; Li, C.; Jiang, M.C. Water quality assessment using optimized CWQII in Taihu Lake. Environ. Res. 2022, 214, 113713. [Google Scholar] [CrossRef]
Uddin, M.G.; Nash, S.; Rahman, A.; Olbert, A.I. A comprehensive method for improvement of water quality index (WQI) models for coastal water quality assessment. Water Res. 2022, 219, 118532. [Google Scholar] [CrossRef]
Talukdar, S.; Shahfahad; Ahmed, S.; Naikoo, M.W.; Rahman, A.; Mallik, S.; Ningthoujam, S.; Bera, S.; Ramana, G.V. Predicting lake water quality index with sensitivity-uncertainty analysis using deep learning algorithms. J. Clean. Prod. 2023, 406, 136885. [Google Scholar] [CrossRef]
Brester, C.; Ryzhikov, I.; Siponen, S.; Jayapraksh, B.; Ikonen, J.; Pitkanen, T.; Miettinen, I.K.; Torvinen, E.; Kolehmainen, M. Potential and limitations of a pilot-scale drinking water distribution system for bacterial community predictive modelling. Sci. Total Environ. 2020, 717, 137249. [Google Scholar] [CrossRef]
Liu, P.; Wang, J.; Sangaiah, A.K.; Xie, Y.; Yin, X.C. Analysis and Prediction of Water Quality Using LSTM Deep Neural Networks in IoT Environment. Sustainability 2019, 11, 2058. [Google Scholar] [CrossRef]
Sokolova, E.; Ivarsson, O.; Lillieström, A.; Nora, K.S.; Rydberg, H.; Bondelind, M. Data-driven models for predicting microbial water quality in the drinking water source using E. coli monitoring and hydrometeorological data. Sci. Total Environ. 2022, 802, 149798. [Google Scholar] [CrossRef]
Chakraborty, D.; Başağaoğlu, H.; Gutierrez, L.; Mirchi, A. Explainable AI reveals new hydroclimatic insights for ecosystem-centric groundwater management. Environ. Res. Lett. 2021, 16, 114024. [Google Scholar] [CrossRef]
Stef, N.; Başağaoğlu, H.; Chakraborty, D.; Jabeur, S.B. Does institutional quality affect CO₂ emissions? Evidence from explainable artificial intelligence models. Energy Econ. 2023, 124, 106822. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of water resource cycle.

Figure 2. Machine learning in water resource modeling. USGS stands for United States Geological Survey; NASA-SRTM represents Shuttle Radar Topography Mission; CNEMC denotes China National Environmental Monitoring Centre; SWAT stands for Soil and Water Assessment Tool; GES DISC stands for the Goddard Earth Sciences Data Information Center.

Table 1. Some machine learning methods and their advantages and disadvantages.

Algorithm	Application	Advantages/Disadvantages
LR	Classification	Suitable for binary classification, the output can be interpreted as probabilities/not effective for complex nonlinear problems.
LDA	Classification/Dimensionality reduction	Suitable for multi classification and can also be used for dimensionality reduction/not suitable for dimensionality reduction on non-Gaussian distribution samples.
Perceptron	Classification	Suitable for large-scale datasets (with short training time)/sensitive to noise and outliers, poor performance for nonlinear problems.
ANN	Classification/Regression	Suitable for solving complex nonlinear problems/no unified theoretical guidance for the construction of network structure and easily trapped in local optima.
SVM	Classification	Not easily affected by noise interference/training takes a long time and requires normalization.
Naive Bayes	Classification	Not sensitive to outliers and missing values/poor classification performance on datasets with high feature dependency.
ID3	Classification	Can train models with missing feature values/nodes tend to choose attributes with a higher number of values.
CART	Classification/Regression	Can solve classification or regression problems/cannot guarantee global optimal solution.
RF	Classification/Regression	No normalization required, parallel processing/not suitable for small and low dimensional datasets.
KNN	Classification/Regression	No need for early training/the model’s performance heavily depends on the selection of k values.
DeepForest	Classification/Regression	Superparameters are much fewer than deep ANN/high memory consumption, trained only with CPU.
Linear regression	Regression	Suitable for large-scale datasets, high computational efficiency/Inability to handle nonlinear problems.
Ridge regression	Regression	Can be used to solve multicollinearity problems/not suitable for feature selection.
Lasso regression	Regression	Suitable for feature selection, solving multicollinearity problems/Difficult selection of regularization coefficients.
SVR	Regression	Suitable for handling complex nonlinear regression problems/difficult parameter selection (kernel function, regularization parameters).
Kmeans	Clustering	Fast convergence speed/The k value needs to be determined in advance.
LVQ	Clustering	Insensitive to noise (relative to Kmeans)/slow convergence speed.
DBSCAN	Clustering	No need to determine the number of clusters k in advance/the values of parameter e and MinPts have a significant impact on the results.
AGNES	Clustering	The hierarchical relationships of different clusters can be discovered/the clustering results are greatly affected by singular values.
GMM	Clustering	Suitable for big data exploration/the definition of core objects has a significant impact on the results.
PCA	Dimensionality reduction	Eliminate the mutual influence of different features, not affected by the labeling information of the samples/lack of interpretability.
t-SNE	Dimensionality reduction	Retain similarity information in data, suitable for visualizing high-dimensional data/poor performance for large-scale data (high computational complexity).
Locally linear embedding	Dimensionality reduction	The computational complexity is relatively small, making it easy to implement/different numbers of nearest neighbors have a significant impact on the final dimensionality reduction results.
ISOMAP	Dimensionality reduction	Invariant to the overall translation, rotation, and flipping of the sample dataset/difficult to recover the inherent structure of the dataset with excessive noise.
Laplacian eigenmaps	Dimensionality reduction	Insensitive to noise and isolated points/difficult to select thermal nuclear parameters.
Locality preserving projections	Dimensionality reduction	Reduce dimensionality while preserving local nearest neighbor node information/can only be used for linear dimensionality reduction.

Table 2. Common evaluation indicators for machine learning.

Index	Application	Formula
$A c c$	Classification	$(T P + T N) / (T P + T N + F P + F N)$
$S n$	Classification	$T P / (T P + F N)$
$S p$	Classification	$T N / (T N + F P)$
$M C C$	Classification	$\frac{T P \times T N - F P \times F N}{\sqrt{(T P + F P) \times (T P + F N) \times (T N + F P) \times (T N + F N)}}$
$F 1$	Classification	$2 \times T P / (2 \times T P + F P + F N)$
$R^{2}$	Regression	$(1 / N) \times [\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2} - \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}] / (\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2})$
$M S E$	Regression	$(1 / N) \times \sum_{i = 1}^{N} {({\hat{y}}_{i} - y_{i})}^{2}$
$E V S$	Regression	$[v a r (y) - v a r (y - \hat{y})] / v a r (y)$
$M A E$	Regression	$(1 / N) \times \sum_{i = 1}^{N} \|y_{i} - {\hat{y}}_{i}\|$
$R M S E$	Regression	$\sqrt{(1 / N) \times \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}$
$J C$	Clustering	$\|SS\| / (\|SS\| + \|SD\| + \|DS\|)$
$F M I$	Clustering	$\sqrt{{\|S S\|}^{2} / (\|S S\| + \|S D\|) \times (\|S S\| + \|D S\|)}$
$R I$	Clustering	$2 \times (\|SS\| + \|DD\|) / [m \times (m - 1)]$

Note: In the classification process, TP represents predicting true samples as true, TN stands for predicting false samples as false, FP depicts predicting false samples as true, and FN denotes predicting true samples as false. For regression,

N

represents the total number of samples,

y_{i}

stands for the true labelled value of the

i

th sample,

\bar{y}

denotes the mean of the sample labels, and

{\hat{y}}_{i}

represents the predicted labelled value of the

i

th sample. During clustering, the value

m

represents the total number of samples. The symbol

|SS|

indicates that sample pairs are classified into the same cluster and belong to the same cluster in the reference division. Similarly,

|SD|

denotes sample pairs classified into the same cluster but not belonging to the same cluster in the reference division, while

|DS|

depicts sample pairs classified into different clusters but belonging to the same cluster in the reference division. Finally,

|DD|

represents sample pairs classified into different clusters and belonging to different clusters in the reference division.

Table 3. Application of machine learning in water resource modeling.

Reference	Algorithm	Study Area	Dataset	Framework	Model Performance	Water Field
Hu and Wu. (2021) [38]	ARIMA	JiangXi	Acquired	MATLAB	/	Rainfall
Gui and Shao. (2017) [40]	Markov chain	Dangshan	Existing	/	Related error = 1.9%	Rainfall
Dimri et al. (2008) [41]	KNN	India	Existing	Python	ACC = 71–88%	Rainfall
Ghazvinian et al. (2020) [42]	ANN	Semnan	Acquired	Python	MAE = 2.3261	Rainfall
Wolfensberger et al. (2021) [43]	RF, RZC	SwissMetNet	Existing	Python	RMSE = 1.35	Rainfall
Umirbekov et al. (2022) [44]	SVR	Tian-Shan Mountain	Acquired	/	$R^{2}$ = 0.57	Rainfall
Kumar et al. (2019) [45]	LSTM and RNN	India	Acquired	Python	RMSE = 424.23	Rainfall
Shen and Ban. (2023) [46]	Hybridizer	Lanzhou	Existing	MATLAB and Python	RMSE = 16.41	Rainfall
Bi et al. (2023) [48]	CNN	/	Existing	/	/	Rainfall
Latt et al. (2014) [55]	SMLR	Myanmar	Existing	MATLAB	$R^{2}$ = 0.99	Flood
Wang et al. (2017) [14]	BP	Dingan River	Acquired	Python	NSE = 0.937	Flood
Liong et al. (2007) [58]	SVM	Bangladesh	Existing	/	$R^{2}$ = 0.931	Flood
Nguyen et al. (2020) [59]	V-SVR	Taiwan	Acquired	Python	RMSE = 0.07	Flood
Lohani et al. (2012) [60]	ANN	Bhakra Dam	Existing	Python	RMSE = 256.30	Flood
Li et al. (2016) [61]	ANN	Poyang Lake	Acquired	Python	$R^{2}$ = 0.999	Flood
Kabir et al. (2020) [62]	CNN	Northwest England	Existing	Python	RMSE = 0.08	Flood
Hosseiny et al. (2021) [63]	U-NetRiver	United States	Existing	Python	MAE = 0.0077	Flood
Zhang and Pan. (2014) [70]	/	Harbin	Existing	/	/	Storm
Huang et al. (2017) [71]	/	Beijing	Acquired	/	ACC = 99.51%	Storm
Zeng et al. (2017) [72]	/	Dongguan	Existing	/	/	Storm
Lai et al. (2017) [76]	SOM-ANN	Beijing	Collected	MATLAB	/	Storm
Yan et al. (2018) [77]	SVM	Hangzhou	Existing	Python	RMSE = 0.038	Storm
Wu et al. (2020) [78]	GBDT	Zhengzhou	Acquired	Python	MRE = 62.76%	Storm
Kim and Han. (2020) [81]	SOM	Korea	Acquired	/	RMSE = 0.032	Storm
Ghumman et al. (2011) [90]	ANN	Hub River	Acquired	/	RMSE = 3.27	Runoff
Tan et al. (2018) [91]	ANN	Yangtze River	Acquired	/	R = 0.983	Runoff
Liao et al. (2023) [92]	ANN	Lancang River	Acquired	/	RMSE = 252.74	Runoff
Han et al. (2023) [93]	LSTM	Yangtze River	Acquired	/	R = 0.928	Runoff
Kratzert et al. (2018) [84]	LSTM	United States	Existing	Python	NSE = 0.90	Runoff
Hu et al. (2018) [95]	ANN/LSTM	Fen River	Acquired	/	R² = 0.95	Runoff
Gao et al. (2020) [96]	LSTM	Shaxi River	Acquired	Python	MAE = 6.0 NSE = 0.990	Runoff
Naganna et al. (2023) [97]	CNN/RF/MLP	Cauvery River,	Acquired	/	SMAPE = 22.7965	Runoff
Chai et al. (2009) [106]	ANN	Goulburn River	Acquired	/	RMSE = 7.8, R² = 0.67	Soil moisture
Ahmad et al. (2010) [107]	SVM	Lower Colorado River	Acquired	/	RMSE = 1.19, R² = 0.74	Soil moisture
Liu et al. (2010) [111]	SVM	Yixing	Acquired	/	R = 0.9122	Soil moisture
ElSaadani et al. (2021) [112]	LSTM	South Louisiana	Existing	Python	NMRSE = 5.2%	Soil moisture
Gao et al. (2022) [113]	LSTM	Guangzhou	Collected	Python	RMSE = 0.48%, R² = 0.94	Soil moisture
Datta et al. (2023) [114]	LSTM	SanAntonio Mountain	Acquired	/	R² = 0.9209, RMSE = 0.0217	Soil moisture
Prasad et al. (2019) [116]	Hybridizer	New South Wales	Acquired	MATLAB	R = 0.954, RMSE = 0.033	Soil moisture
Jamei et al. (2023) [118]	Hybridizer	Iran	Acquired	Python	R = 0.9993, RMSE = 0.0025	Soil moisture
Kumar et al. (2002) [122]	ANN	California	Existing	/	MAE = 9.2, R² = 0.949	Evapotranspiration
Adamala et al. (2014) [123]	SONN	India.	Acquired	MATLAB	RMSE = 0.077, R² = 0.998	Evapotranspiration
Antonopoulos et al. (2017) [124]	ANN	West Macedonia	Acquired	/	R = 0.876, RMSE = 0.936	Evapotranspiration
Chen et al. (2020) [127]	ANN	China	Acquired	/	R² = 0.831, RMSE = 0.755	Evapotranspiration
Karbasi et al. (2022) [128]	AED-BiLSTM	Iran	Existing	MATLAB and Python	R = 0.9835 RMSE = 3.4597	Evapotranspiration
Maroufpoor et al. (2020) [131]	ANN-GWO	Iran	Acquired	MATLAB	R² = 0.884, MAE = 0.717	Evapotranspiration
Roy et al. (2020) [132]	ANFIS	Bangladesh	Acquired	MATLAB	R = 1.000, RMSE = 0.021	Evapotranspiration
Troncoso-García et al., (2023) [133]	LSTM	Europe.	Acquired	Python	RSME = 1.0614, R² = 0.7194	Evapotranspiration
Iqbal et al. (2020) [141]	ANN	Ravi and Sutlej River	Acquired	MATLAB	MAE = 0.031, R = 0.974	Groundwater level
Natarajan et al. (2020) [142]	ELM	India	Existing	/	RMSE = 0.277	Groundwater level
Liu et al. (2021) [143]	SVM-DA	Northeast US	Acquired	/	R = 0.96, MAE = 0.44	Groundwater level
Rahman et al. (2020) [144]	WT-XGB	Kumamoto	Acquired	R	R² = 0.84, NSE = 0.81	Groundwater level
Zhang et al. (2023) [136]	WT-LSTM	Xi’an and Yinchuan	Existing	/	NSE = 0.843	Groundwater level
Uddin et al. (2022) [154]	ExT and XGB	Irish	Acquired	/	R² = 1, RMSE = 0.0	WQI
Uddin et al. (2022) [156]	XGB, MLR	Irish	Acquired	/	R² = 0.97, RMSE = 3.1	WQI
Talukdar et al. (2023) [157]	CNN, DNN	India	Acquired	R	RMSE = 5.07, R² = 0.98	WQI
Brester et al. (2020) [158]	RF	Kuopio, Finland	Acquired	/	IA = 0.92	Drinking water
Liu et al. (2019) [159]	LSTM	Yangzhou	Acquired	Python	MSE = 0.02	Drinking water
Sokolova et al. (2022) [160]	TPOT	Gothenburg, Sweden.	Acquired	Python	R² = 0.62, MAE = 0.22	Drinking water

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Z.; Zhou, J.; Yang, X.; Zhao, Z.; Lv, Y. Research on Water Resource Modeling Based on Machine Learning Technologies. Water 2024, 16, 472. https://doi.org/10.3390/w16030472

AMA Style

Liu Z, Zhou J, Yang X, Zhao Z, Lv Y. Research on Water Resource Modeling Based on Machine Learning Technologies. Water. 2024; 16(3):472. https://doi.org/10.3390/w16030472

Chicago/Turabian Style

Liu, Ze, Jingzhao Zhou, Xiaoyang Yang, Zechuan Zhao, and Yang Lv. 2024. "Research on Water Resource Modeling Based on Machine Learning Technologies" Water 16, no. 3: 472. https://doi.org/10.3390/w16030472

APA Style

Liu, Z., Zhou, J., Yang, X., Zhao, Z., & Lv, Y. (2024). Research on Water Resource Modeling Based on Machine Learning Technologies. Water, 16(3), 472. https://doi.org/10.3390/w16030472

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Water Resource Modeling Based on Machine Learning Technologies

Abstract

1. Introduction

2. Overview of Machine Learning

2.1. Choosing the Right Method for Different Problems

2.2. Selection of Appropriate Assessment Methods for Different Issues

3. Application of Machine Learning for Water Resource Modeling

3.1. Precipitation Prediction

3.2. Flood Forecasting

Urban Waterlogging Prediction

3.3. Runoff Prediction

3.3.1. Medium- and Long-Term Runoff Prediction

3.3.2. Short Term Runoff Prediction

3.4. Soil Moisture Prediction

3.5. Evapotranspiration Prediction

3.6. Groundwater Level Prediction

3.7. Water Quality Prediction

4. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI