This section discusses IAI- and XAI-based predictions in diverse nonlinear hydroclimatic processes from multidimensional predictors. A list of predictors and notations used for each hydroclimatic application is summarized in a table in the same section. Only repetitively used variables and notations, in addition to acronyms for the IAI and XAI models are provided in the Abbreviation section.
4.2.1. Evapotranspiration Predictions
Evapotranspiration (
) is a critical indicator of global climate change [
51], and its reliable prediction is imperative for irrigation, agriculture, and surface water and groundwater management and planning [
52].
is the sum of evaporation from soil and transpiration from vegetation. It is often reported as the reference crop evapotranspiration (
), actual evapotranspiration (
), potential evapotranspiration from wet surfaces (
), or surfaces covered by large volume of water, such as wetlands or lakes (
).
is commonly predicted from a time series of meteorological predictors, including
,
,
,
, and
.
from open water surfaces also include the surface water temperature (
) as a predictor. Terrestrial
requires information on vegetation cover.
The FAO56-Penman Monteith Equation (PME) [
53] has been commonly used to predict
from
,
,
,
, and
; however, complete meteorologic data are not available at some locations across the globe. Therefore, AI-based
predictions from incomplete meteorological data have been examined in the literature. Performance of several tree-based, kernel-based, and curve-based AI models were compared in predicting daily
from daily minimum and maximum
(
,
), and
P at 14 stations from 2001 to 2015 in different climate zones in China [
54]. Use of
P as a predictor for
, however, is uncommon and inconsistent with the PME. The authors assumed that
P would represent
especially in (sub)tropical-humid regions, which remains questionable. Based on this assumption and using 70% of the data to train the models, the authors concluded that Support Vector Machine (SVM) predicted daily
with the highest accuracy while outperforming the XGBoost model. In a different study, prediction accuracy of the CatBoost, RF, and Generalized Regression Neural Network models were compared in estimating
in arid and semi-arid regions in China [
55]. Eight different combinations of
,
,
,
, and
that were monitored from 1996 to 2015 at 15 stations were used as predictors. The 1996–2009 records were used for training and the 2010–2015 for testing the AI models. All the AI models performed well with incomplete data when only
was not included as a predictor. Therefore, the authors recommended these AI models to predict
at the sites with missing meteorological data. Conversely, CatBoost exhibited the best performance for all the combinations of data, and hence, was recommended for regions with similar climates. In a similar study, the performance of the optimized CatBoost, RF, and SVM models were compared in predicting daily
at 12 weather stations in a subtropical region in China using different combinations of daily local meteorologic predictors of
,
,
,
, and
under presumably water-scarce conditions [
56]. The data from 2001–2010 and 2011–2015 were used to train and test the AI models, respectively. The authors concluded that all three AI models achieved satisfactory accuracy for
prediction using either
,
,
or
,
,
,
, suggesting that either reduced predictors set can be used in water-scarce subtropical regions to predict
. They also noted that when
,
,
,
, and
were available, CatBoost yielded the best prediction accuracy. Conversely, SVM yielded the best prediction accuracy when some of the climatic data types were missing. In brief, [
55,
56] indicated that SVM (a non-IAI model) is a better predictor tool when some meteorologic data are missing, whereas CatBoost (an IAI model) is a better predictor tool when complete meteorologic data is available for AI-based
predictions.
To further evaluate the relative performance of the IAI and non-IAI models in predicting
from a complete set of multi-dimensional meteorological data, predictive accuracy of three optimized IAI models (XGBoost, RF, Linear Regression (LR)) and three optimized non-IAI models (DL, SVM, Long short-term memory (LSTM)) were compared in estimating daily
computed by FAO56-PME from structured tabular data, including
,
,
,
, and
over 4–5 years from multiple meteorological stations in a semi-arid region in Texas, USA [
18]. Using 90% of the data for model training, prediction accuracy of the AI models was in the order of DL∼XGBoost>RF>LR∼SVM>LSTM. The authors concluded that the top-performing IAI model (XGBoost) exhibited comparable performance to the top performing non-IAI model (DL) in predicting daily
. They developed a XAI model by coupling XGBoost with the SHAP method. The global SHAP analysis unveiled that the relative importance of the meteorological variables in
prediction was in the order of
>
>
>
>
for the study area. Local SHAP and LIME analyses identified the inflection point of each predictor above or below which
would increase. The inflection points were subsequently used to set up testable hypotheses using conditional probabilities to justify the XAI predictions and seek new knowledge. Considering the median observed
value as the threshold, the authors showed that
17.16 kW/m
2,
20.62 °C, and
72.17% at one of the sites, then
would almost surely be below the median
. To our knowledge, the XGBoost-SHAP-LIME model in this study was the first XAI model accompanied with testable hypotheses used for enhanced interpretability and explainability of daily
predictions. The authors concluded that the XGBoost-based XAI framework displayed comparable performance to DL in predicting
while holding physical interpretability of the predictors–predictand dynamics and unveiling the order of importance of the predictors in
predictions. Unlike in [
18], the feasibility of
predictions from a single meteorological variable was investigated using the optimized XGBoost, RF, and Deep Neural Network (DNN) models by comparing the results against daily
estimates from 32 years of local meteorological data in California, USA, including
,
,
,
,
,
and
in [
34]. Through the global Shapley and Gini-based feature importance analyses implemented with the RF and XGBoost models (led to XAI models), they concluded that
was the most influential predictor at three sites with different climatic conditions, in agreement with the conclusions in [
18]. Using daily
as the sole predictor, daily FAO56-PME-computed
as the predictand, and assigning 80% of the data to train the AI models, they concluded that DNN exhibited better prediction accuracy than XGBoost and RF. Their approach is different in the sense that it coupled the enhanced interpretability of the tree-based modeling and the high prediction capability of a noninterpretable DNN modeling for
predictions.
A critical challenge with the earlier AI models was that the nonlinear relationship between climatic variables and the
makes it difficult to account for inherent uncertainties [
57]. This challenge was addressed in [
17] by formulating a novel probabilistic IAI model, built on the hybrid XGBoost-NGBoost framework, to predict daily
,
, and
using 3–5 years of daily meteorological data, including
,
,
,
,
, month,
(for
prediction), and
(for
prediction) in south-central Texas, USA. Different from the earlier AI models, the hybrid XGBoost-NGBoost was able to produce not only point predictions, but also the probability distribution over the entire outcome space to quantify uncertainties associated with
predictions. Using 90% of the data for model training, they demonstrated that probabilistic approach exhibited great potential to overcome data uncertainties, in which 100% of the
, 89.9% of the
, and 93% of the
test data at three watersheds were within the models’ 95% prediction intervals. Using the XGBoost-SHAP (a XAI model) analysis, the authors identified the top three influential features to be
,
, and
for
;
,
, and month for
; and
, month, and
for
predictions at the semi-arid site.
4.2.2. Precipitation Predictions
The spatiotemporal variability and uncertainties in precipitation (
P) measurements [
58] make it a difficult hydroclimatic variable to work with, although it is a critical predictor for diverse hydroclimatic processes, such as surface runoff, flood, droughts, and aquifer recharge.
Stable isotopes of hydrogen and oxygen (
and
) have been used as natural tracers to improve our understanding of hydrological and meteorological processes, including precipitation formation mechanisms [
59]. An XGBoost model was recently used to explore interannual and longterm variability in monthly
and
time series of
P using location and climate data [
60]. The location data included the latitude (
), longitude (
), and altitude (
) of the data site. The climate data included local climate data (e.g.,
P,
,
,
, vapor pressure (
) and climate indices associated with large-scale atmospheric circulation (e.g., North Atlantic Oscillation index, the Scandinavian pattern) from a large number of gridded and time-series European data sources. In addition to the location and climate data, the month and season of the year and Köppen climate regions were used as predictors in the IAI model. The authors used 32,191 monthly observations of at least 1 stable isotope value from 270 stations for the period from 1960 to 2018, in which ∼20% of the data was used for model testing. They developed three independent IAI models using XGBoost, each for
,
, and deuterium-excess (d-excess). They implemented three modeling steps: First, they ran these IAI models with the complete set of predictors. Next, they ran the models with the most important predictors only. Finally, each IAI model with the reduced predictors list also used the predicted predictands from the other two IAI models as the additional predictors. The overall IAI model was named Piso-AI, which is suitable to produce point and gridded monthly
and
of
P on demand, as the predictors are regularly updated. The model is useful to provide isotope input variables for ecological and hydrological application and paleoclimate proxy calibration. Prediction accuracy of the Piso-AI was reported to be better than the other predictive tools when the interannual variations were important. In our opinion, when/if the Piso-AI model is coupled with the explanatory methods such as SHAP, it could provide enhanced insights into predictors-predictands dynamics and overall results.
In hydroclimatic applications, gridded
P data at coarser spatial scales can be used for local
P estimates after downscaling, if they are shown to be representative of local climatic conditions. A RF model was used to assess the similarity of gridded monthly
P and
data from external sources and locally observed data [
61]. The suitability of seven external gridded
P and five gridded
datasets with the spatial resolution of 0.25–0.50° was evaluated and ranked with respect to monthly observed local time-series data at 57 stations in Egypt for the period 1979–2014. Four grid points surrounding a station were interpolated to the station location using a inverse distance weighting method to generate time series of observed and gridded external data pairs at each station. The similarity index was defined as the number of times that the observed and gridded data at the particular station took the same path and placed in the same terminal node of the same tree in the RF model. Different from other IAI model applications, the entire data were used to train the IAI model. Using the RF model, the authors identified the most representative external climate datasets that agree with monthly local
P and
at each station as well as their spatial variations. Because
P influences many hydroclimatic processes and decisions, such IAI-based similarity assessments between remotely-sensed data with local measurements are indispensable in the development of local or regional water management decisions, especially when locally-measured
P datasets are scarce or precarious.
The effect of
P zoning on the accuracy of IAI-based downscaling of gridded
P data from remote sensing precipitation products with a spatial resolution of 0.25° to ground-based
P data with a spatial resolution of 1-km was investigated in [
62]. Such
P zoning was implemented to identify the predominant regional patterns of
P variability. The study was conducted across the Lancang–Mekong River basin, which has a total area of about 795,000 km
2 covering parts of Southwest China, Myanmar, Laos, Cambodia, Thailand, and Vietnam and spans over multiple climate zones. The monthly satellite-based
P data and ground-based
P data from 29 meteorological stations and 261 rain gauge stations from 2000 to 2014 were used in the IAI analysis. Twelve meteorological stations and 229 rain gauges in 2001 (wet year), 200 rain gauges in 2005 (normal year), and 24 rain gauges in 2009 (dry year) were used for model validation. The authors used the iterative rotated empirical orthogonal function analysis of ground- and satellite-based
P observations to delineate 6–7
P zones. They considered two cases: the first one did not involve discrete
P zones and RF was used for the entire study area; in the second case, the study area was divided into different
P zones and RF was applied independently to each zone. The authors implemented the RF model for downscaling, in which the latitude (
), longitude (
), altitude/elevation (
), slope (
), and normalized difference vegetation impacts (
) were used as predictors and satellite-based
P was used as the predictand. RF was trained and validated over the 0.25°- resolution data (coarser resolution). The validated RF was then used with a 1 km resolution data (
,
,
,
) to predict
P at a 1 km resolution (finer resolution). The author concluded that zoning-based downscaling outperformed non-zoning-based downscaling in terms of the prediction accuracy. A permutation test implemented to assess the importance measure of the predictors revealed different importance rankings of the predictors responsible for
P distributions at different spatial scales (e.g., at each
P zone scale vs. at the entire study area scale). Thus, the spatial scale dependency of the predictors–predictand relation in this case could raise concern about the suitability of the RF model to predict
P at finer resolutions, after being trained with data at coarser spatial resolutions without implementing proper scale-dependent error corrections, as discussed in [
35].
4.2.3. Soil Moisture Predictions
Soil moisture (
) is a spatially heterogeneous variable that affects surface runoff, base flow, aquifer recharge, and vegetation cover [
63], and hence, it is a critical measure in hydrologic modeling and water and irrigation management decisions. Although remote sensing data have been commonly used to derive local-scale
data, a mismatch between them is a challenge to overcome. Predictors used for IAI-based
predictions in recent studies are summarized in
Table 1.
Similar to
P data, gridded data at coarser-spatial scales are commonly used to predict local-scale
. The RF model was used to downscale
data (at tens of km-scale) from passive microwave surface
products, including the soil moisture active passive (SMAP) and soil moisture and ocean salinity satellite (SMOS) products to obtain more accurate
data over an area of 2452 km
2 in China at finer spatial resolution (at 1 km-scale) [
35]. The authors attempted to predict local
data—after being downscaled from SMAP/SMOS using RF—from a set of predictor variables at finer resolution, involving vegetation (
,
,
), land surface (
), hydro-climatic (
), and topographic (
, and
) features. Their approach involved three main steps: (i) resample predictors to coarser resolution of the SMAP and SMOS data and establish a regression relation between the upscaled predictor variables and SMAP/SMOS
data at coarser resolution; (ii) resample the residuals at the coarse resolution and RF-predicted
data to finer resolution (1-km scale); and (iii) predict
at finer resolution from predicted variables at finer resolution using the RF regression developed in step (i) and add the residuals computed in (ii) to the predicted
data at finer resolution to determine local-scale
. The authors concluded that RF-downscaled SMAP data performed better than SMOS data. RF-SHAP (a XAI model) analysis unveiled that
,
, and
were the most influential features for SMAP-RF while
,
, and
were the most critical features for SMOS-RF. This study introduced a new practical approach for XAI-based local scale
predictions. It would be beneficial to look into if prediction accuracy of the proposed downscaling method could further improve, if different IAI models other than RF are used.
Remote sensing techniques, however, capture only near-surface
features and storage [
64], which could differ from
at deeper depths in trends and magnitudes. Therefore, in situ
measurements were combined with remotely sensed terrain attributes to predict soil-water storage at uninstrumented regions in a basin in California, USA [
65]. The authors used the RF model to predict daily inter- and intra-annual
storage at 10-, 30-, 60-, and 90-cm depths for 6 years, using soil (
), topographic (
,
,
), and vegetation (
) features as the predictors. Based on this IAI-modeling set-up, the authors concluded that different predictors were more influential in different periods such as wet-up, snow cover, recession, and dry periods. For example, although
was consistently a critical feature in all periods,
peaked during the wet-up period while
and
peaked during the recession and dry periods. However, the chosen five predictors were static variables without temporal components, which were used to predict temporal variations in
at different depths. Inclusion of other time-variant predictors such as
P, snow-pack depth,
in the IAI model could have captured temporal variations in
predictions more accurately.
Moreover, root zone soil moisture (
) is a critical variable for agricultural productivity, crop water stress, and drought monitoring. Accuracy of the optimized RF and physics-based (HYDRUS 1D [
66] with data assimilation) models was evaluated for interpolation (for data imputation) and extrapolation (for predictions using testing data) of daily
from a list of predictors, including meteorological (
P,
,
,
,
,
,
) and vegetation (
,
) features,
at 5 cm-depth at 15 locations over∼32 month, lagged values of the
and meteorological variables, in addition to day of the year [
67]. The data length was relatively short, yet 50% of the data was allocated to train the RF model. The authors assessed the importance of the variables using the permutation method, which revealed that surface soil moisture, soil properties, and land cover types have larger impacts on
than meteorological variables. Different from earlier IAI-based analyses, the authors compared the performance of the IAI models over the entire period as well as for the extreme dry and wet conditions. They concluded that RF interpolations for
have higher accuracy than RF extrapolations. Moreover, RF interpolations exhibited better prediction accuracy than HYDRUS 1D simulations, but RF extrapolations were comparable to HYDRUS 1D simulations. However, RF overestimated extreme dry conditions, but underestimated extreme wet conditions. This could be due to the relatively short time period used in the analysis, which possibly did not provide enough data to train the model for the extreme conditions properly. Nevertheless, the study demonstrated that the RF model emerged as a computationally efficient prediction tool as an alternative to the Hydrus 1D model to predict
.
4.2.4. Groundwater Potential Predictions
Assessment of groundwater potential (
) is critical for conservation, sustainable water management, and drought mitigation strategies [
68,
69].
has been predicted in data-scarce regions using AI models trained by groundwater level, spring inventory, meteorologic, topographic, geologic, soil and surface water data at nearby sites. Predictors used for IAI-based
predictions in aquifer data-scarce regions in recent studies are summarized in
Table 2.
Information from a limited number of groundwater well locations has been used in recent studies to predict regional-scale
in data-scarce regions. The RF and GBoost models were used to predict
categorically over a 3339 km
2 region in India using meteorologic (
and
P), topographic (
,
,
,
,
), soil (
,
,
,
), distance (
,
), and geologic (
) features as the predictors [
70]. The IAI models were trained and tested using target data from an equal number of groundwater wells and non-groundwater locations. By allocating 80% of the data for model training, the IAI models produced sensible predictions, where GBoost outperformed RF, and
,
,
, and
emerged as the most critical features based on the Gini index analysis. In a similar study, the RF, GBoost, and XGBoost model were implemented using geologic, hydrologic, topographic, and land cover features to predict categorically
at sites with no wells in an attempt to generate regional
maps over an area of 747 km
2 in South Korea [
71]. Information from an equal number of groundwater well locations and non-groundwater locations were used to train and test the IAI models, in which 70% of the data was used for training. The authors implemented the elastic net method a priori to eliminate insignificant features to
predictions. As a result, they only considered topographic (
,
,
), surface water (
), soil (
), distance (
,
,
), geologic (
), and soil (
,
) features as the predictors in the IAI models. Thus, different from [
70], the refined predictor list did not include meteorologic variables and
. The reduced number of predictors used in all three IAI models produced reliable
maps for the study area, where XGBoost performed better than GBoost (the second best) and RF. Better performance of XGBoost over GBoost was attributed to (i) implementation of the second-order derivatives in XGBoost—as opposed to first-order derivatives in GBoost—to minimize the loss function and obtain more accurate tree and (ii) regularization features implemented in XGBoost to avoid overfitting.
In the absence of detailed aquifer and groundwater level data, spring data have been used as a surrogate predictand to estimate
. The optimized parallel RF (PRF) and XGBoost were used to determine
categorically in data-scarce regions in Iran on the basis of spring data using only DEM-derived spring associated factors (DEM-SDF) [
72]. These factors included topographic (
,
,
,
,
,
,
,
,
), surface water (
), soil/surface (
,
), and distance (
) features, which were used as predictors. The authors used 944 springs locations and randomly generated 944 non-spring locations over an area of 1676 km
2 as the target data. Based on the 70:30 split ratio for the training and testing datasets, the authors reported that PRF and XGBoost predictions showed∼80% similarity and predicted high
regions closely. Different from the conclusion in [
70], Gini impurity revealed that
,
,
, and
are the most indispensable features in
predictions based on spring data and DEM-SDF.
Similarly, an AI-driven regional
map was developed based on the spring data [
73]. The authors used the optimized RF, LR, Decision Trees (DT), Artificial Neural Networks (ANN), and their combinations (i.e., additional 11 AI models) to predict
categorically over a karstic aquifer in a mountainous region in Morocco using the spring inventory as the target variable, and meteorologic (
P), topographic (
,
,
,
,
,
,
,
,
,
,
,
), soil/surface (
), geologic (
,
,
), distance (
,
,
), surface water (
), surface and soil moisture-related (
,
) features as the predictors. The spring inventory data included 347 spring locations and 1124 randomly chosen non-spring locations. They allocated 75% of the data to train the models. Prior to AI analysis, they performed multicollinearity analysis to determine linear dependency among the predictors to avoid redundancy, and computed information gain (IG) to identify the predictors positively associated with the enhanced
to reduce the number of predictors. However, multicollinearity analysis is not required for IAI modeling, as the models can handle redundant predictors. Besides, RF-SHAP (a XAI model) can unveil more effectively and accurately the order of importance of the predictors and the inflection point of each predictor above/below which the predictor would result in enhanced or reduced
. Nonetheless, based on multicollinearity and IG analyses, the authors retained all the predictors in the AI analysis. Based on RF-driven ranking,
(lithologic),
,
(tectonic),
P (meteorologic) were identified to be the most important predictors. Different from the analysis in [
72], the authors tested the predictive accuracy of the weighted-aggregation of RF, LR, DT, and ANN to estimate
, where the weights were set to the area under the success rate curve from each AI model. They concluded that weighted-average RF-DT and RF-LR-DT (IAI models) yielded the best prediction accuracy for
prediction for the semi-arid karstic mountainous region.
The results from the studies discussed above are based on different sets of mostly static region-specific predictors. Therefore, it is difficult to make generalizations over relative predictive accuracy of the IAI models used. The IAI-based predictions discussed above require a priori domain knowledge of the variables and system, as the AI predictions based on DEM-SDF would be applicable only to basins where
is expected to be controlled largely by topographic features. Conversely, the topographic watersheds of karst catchments have little significance for their aquifers [
74], therefore such IAI models may not be applicable to estimate
in karstic aquifers. The predictors in the studies discussed above did not include geospatial information about aquifer characteristics such as aquifer type, aquifer thickness, depth to water table, and aquifer parameters (e.g., transmissivity or storativity) in predicting regional
due to scarcity of data, although these features strongly determine
and productivity of aquifers. Furthermore, although well-balanced datasets were used in [
70,
71,
72], an imbalanced dataset (1:3 ratio) was used in [
73]. Imbalanced datasets in model training, however, could cause bias towards the minority class, and hence, impair the prediction accuracy of the model.
4.2.5. Groundwater Level Predictions
Groundwater levels (
) could be affected by climate factors, land use, pumping, and hydraulic interaction with surface and other subsurface waters. Short-term
predictions could be imperative for landslide prone areas [
75], in agricultural regions for scheduling irrigation [
76] and in regions that experience sudden increase in groundwater withdrawals or extreme climate events (e.g., heatwaves). Long-term
predictions under future climate scenarios are critical for development of sustainable groundwater management plans [
18] and sustainability of agricultural production systems [
77]. Predictors used for IAI-based
predictions in recent studies are summarized in
Table 3.
Using meteorological data, the optimized XGBoost, RF, and SVM models, and their hybrid versions were implemented with or without wavelet transforms (WT) for short-term monthly
(1–3 months ahead) predictions in Kumamoto City, one of the regions with the highest groundwater use in Japan [
78]. The authors used monthly time-lagged
, monthly-average
, average monthly total
P, and cumulative monthly
P as the predictors. WT was used to extract time-variant information such as trends and periodicity in the AI modeling. However, such time-variant domain knowledge can alternatively be incorporated using day, month, or year as the engineered features in the AI modeling. The authors used 442 records, and implemented a 85:15 ratio for the training and testing datasets. They concluded that SVM outperformed XGBoost and RF, when the WT is not included. When the AI models were coupled with the WT, however, SVM and XGBoost exhibited comparable predictive accuracy while outperforming RF. WT-AI coupling apparently enhanced the prediction by 3–5%, which was more beneficial for 2–3 months ahead predictions. The authors adopted minimal-redundancy-maximal-relevance to rank the order of importance of the predictors.
The effectiveness of an optimized hybrid K-Nearest Neighbors (KNN)-RF model (a coupled non-IAI and IAI models) for short-term prediction (2 weeks to 3 months ahead) of daily
in a near-surface aquifer in a data-scarce region in Rwanda [
79] was analyzed. The authors used
measurements from a single borehole after removing anomalies via a time-series filtering method prior to their use in the AI analysis.
was related to
T,
P,
, and their 1–4 days lagged values. The AI analyses were performed using ∼2 years of daily data with 759 records. The authors implemented a ‘walk-forward’ approach to predict
from the input data while implementing 88:12 ratio for the initial split for the training and testing datasets. KNN-RF consistently exhibited better prediction accuracy at 15, 30, 60, and 90 days predictions than RF, KNN, SVM, and ANN. Using the KNN-RF model with different combinations of the predictors, the authors concluded that
,
T, and
time-lags in addition to the first lag
P were the most influential predictors on short-term
forecasts. This could have been more effectively analyzed using RF-SHAP (a XAI model), instead of multiple KNN-RF model runs with different combinations of the predictors. These IAI, non-IAI, and hybrid IAI and non-IAI modeling studies sought to predict short-term
based on local meteorologic and hydrologic data. Inclusion of groundwater withdrawals, aquifer parameters, and aquifer recharge in AI-based
forecast analysis could increase their wider acceptance by the water resources and hydrology community.
Different from the applications above, the XGBoost, multivariate LR, RF, multilayer perceptron neural network (MLP), and SVR were used for image (map)-based prediction of monthly
in the southern regions of the African continent at the pixel-level from monthly terrestrial water storage (TWS) maps, the coordinates of the pixels on TWS maps, and monthly time-stamp [
80]. After imputing 10% of the missing monthly images, the authors generated 161 sequences of 12 consecutive images for the period of 2002 and 2019, in which the first 149 images were used for model training and the rest for model testing. The sample size to train the AI models was low in this application. Nonetheless, XGBoost with the gain matrix determined that TWS pixel information from 12-, 11-, and 1 preceding months were the most influential predictors to estimate
in the current month. Among the AI models used, SVR reportedly yielded the best prediction accuracy in predicting
. In this application, XGBoost (an IAI model) provided the information on the feature importance and selection, and SVR (a non-IAI model) yielded overall better prediction accuracy, similar to the implementation of hybrid IAI and non-IAI models in [
34,
81]. The use of additional information on spatiotemporal variations in groundwater withdrawals, however, could have improved the accuracy of
predictions.
4.2.6. Streamflow Predictions
Streamflow (
) is impacted by climate change and human activities, such as dam construction, changing environment, and increased surface water diversions to meet the consumptive water demands in areas with increasing populations [
82]. Predictors used for IAI-based
predictions in recent studies are summarized in
Table 4.
Changing climate and intensified human activities could make the relation between
and predictors non-stationary, which was referred to as the concept drift in [
83]. Because new climate change and human impacts on
are not captured in historical data used for model training, the AI model would not be informed about such gradual or abrupt unprecedented changes that would violate the stationarity assumption, unless the AI model is ‘intervened’ and informed of them. The performance of XGBoost with concept drift detection (CDD) was compared against XGBoost without CDD, RF, SVM, and DTR in predicting one-month ahead
at the Qingliu river catchment in China using meteorologic (
P,
), hydroclimatic (
), soil (
), and hydrologic (past
) features [
83]. In this study, CDD operates based on presumably normally-distributed historical error rate. In XGBoost-CDD modeling, when unprecedented
rates were detected, XGBoost was re-trained with the existing data; otherwise, it was incrementally trained. Using monthly data from 1989 to 2010 and assigning 70% of the initial data for model training, XGBoost-CDD outperformed the prediction accuracy of XGBoost, RF, SVM, and DT, as XGBoost-CDD detected the abrupt change in
in 2003 due to the rapid development of society and economy, quick population growth, and dramatic changes in land cover and use in the region, which required for XGBoost-CDD to be re-trained. In our opinion, the IAI-framework in [
83] sets the stage for interventional IAI in hydrologic applications, as hydrological settings would likely expose to unprecedented consequences of human activity and changing climate on
more often in the future.
The optimized XGBoost-Extreme Learning Machine (ELM) model was used to predict one-month ahead monthly
in the Göksu-Himmeti catchment area in Turkey from hydrologic (multi-lagged
), meteorologic (
P,
), and hydroclimatic (
) data from 1973 to 2010, in which 75% of the data was used for model training. In this study, XGBoost was used as the feature selection tool and ELM as the predictor tool. The authors used ‘gain score’ in splitting a leaf into two leaves in XGBoost to determine the most influential lags among 30 lags for each predictor. After testing XGBoost with different combinations of multi-lagged predictors,
,
P, and
were reported to be the most critical features for one-month ahead
prediction for the study area. The feature importance ranking was used to select the features for ELM. The authors concluded that XGBoost-ELM (a hybrid IAI and non-IAI model) provided higher predictive accuracy than XGBoost alone. Similar to [
34], the advantages of the IAI and non-IAI models were combined in [
81] to achieve higher predictive precision of
.
In addition to IAI-based predictions of
from meteorologic, lagged hydrologic, hydro-climatic, soil-associated, and land surface features, IAI-based models were used to predict
from its spectral and frequency components. For example, singular spectrum analysis (SSA) and LGBoost were integrated to predict real-time urban runoff in Yuelai New City in China [
84]. The authors used 39 rainfall events in this study, in which 33 of them were used for model training and 6 of them for testing. After extracting the trend, fluctuation, and noise components from the runoff time series using SSA, they reconstructed the series using LGBoost. The motivation was that the data pre-processed with the SSA, or other decomposition methods, could significantly improve the AI performance. The authors noted that SSA-LGBoost predicted runoff with higher accuracy and peak error <18%, outperforming LGBoost and LSTM models. On the other hand, using the Fourier Transform (FT) to decompose 10-day inflow time series, the performance of XGBoost and SVR was tested to forecast the decomposed components, based on frequency domain analysis, with each component comprising contiguous frequencies and exhibiting a clear physical meaning [
85]. The authors used the Three Gorges Dam inflow series in China. The 10-day records from 1990 to 2009 were used for training and 2010 to 2015 for testing. Three decomposition strategies were tested: The centered 10-day inflow time series (only one decomposed component) and decomposition into four and seven components. Their results showed that FT-SRV almost perfectly derived the 10-day streamflow forecast with 7 components and outperformed the other decomposition approaches. In addition, their analysis showed that the FT-XGBoost presented a worse performance than the FT-SRV.
4.2.7. Water Level Predictions in Reservoirs, Lakes, and Delineation of Wetlands
Lakes and reservoirs are important fresh water sources for domestic, industrial, agricultural, and recreational water uses, regional flood control, and aquaculture [
86]. Water level (
) is an important physical indicator of lakes, and its fluctuations may impact the sustainability of lake ecosystems [
87], and consumptive water uses under current and future climate conditions and human activities [
88]. Similarly, wetlands are a critical component of a hydrologic system for maintaining hydroecology, flood control, providing nutrients, and controlling
in surface water systems [
89]. Predictors used for IAI-based predictions of
in reservoirs, lakes, and for delineation of wetlands in recent studies are summarized in
Table 5.
In regards to
predictions in reservoirs, the optimized Boosting, RF, Bayesian Linear (BL) and Neural Network (NN) model were used to predict a day- or week-ahead
in the Keymir reservoir in Malaysia, operated for hydropower generation [
90]. Two scenarios with a small number of predictors were considered. In the first scenario, daily
P and
from 1985 to 2019 were used as the predictors to estimate
. In the second scenario, daily
from 2010 to 2019 was also used as the predictor. Using 80% of historical data to train the AI models, the authors achieved higher prediction precision for a day- or week-ahead reservoir
when they included
, where the prediction accuracy of the AI models ranked in the order of Boosting > RF > BL > NN. In this study, IAI models performed better than non-IAI models in predicting short-term
in a reservoir. The authors performed sensitivity analysis to assess prediction uncertainties. This could have been alternatively achieved by combining the Boosting model with the NGBoost as in [
17]. If additional predictors (e.g.,
, more lags in
,
P) are used in such analysis, SHAP analysis can be used to identify the most influential predictors to reduce the input dataset for the IAI and XAI modeling.
Aside from
predictions in reservoirs and lakes, the optimized RF model was used to infer the importance of climatic and abstraction features on
fluctuations in Lake Bracciano in central Italy, which is designated as an emergency water source to be used in severe droughts [
91]. The authors resorted to the IAI modeling, as they did not have sufficient data on water exchange rates between groundwater and lake to construct a lake water-balance equation or use physics-based models. They analyzed the influence of short-term (e.g., run-off) and long-term (e.g., groundwater dynamics) effect of the monthly
P using
at different time scales (1–24 months),
,
,
, and month of the year on
for the period of 1955–2019. Using 50% of the data for model training and implementing computationally-expensive drop-column feature importance approach, they concluded that
,
month of the year,
,
,
, and
were the most critical features. This suggests that
P associated with the groundwater dynamics and water abstraction were the most influential process while
was the least critical variable. Using the RF model, the importance of
with respect to long-term
P variability was shown to increase by 15% after 1985. The authors noted that the importance of a month index needs to be analyzed in combination with the associated time scale of the
P anomaly. This and the feature importance analysis can be done effectively using the local and global SHAP analysis. The SHAP analysis can also reveal the effect of percentage increases or decreases in the predictors’ values (e.g.,
and
) on
fluctuations, which are imperative to assess the potential impacts of changing climate and water abstraction policies on
.
As for
predictions in wetlands, RF, DT, SVM, and ANN were used to predict daily
in Upo wetland in South Korea, which is a large inland wetland with high biodiversity [
92]. The predictions were based on 1–3 days lags of minimum, maximum, and average
,
P, maximum and minimum
, and
at the nearby embankment and drainage pump station. Using the measurements from 2009 to 2015 and keeping the data from the last two years for the model testing, the authors concluded that RF outperformed DT, SWM, and ANN in predicting the overall trend, peak values, and peak occurrence times of
. They also noted the need for further improvements in peak value predictions and peak delay error reductions, which could be achieved by accommodating the information on soil characteristics,
, and backflow during rainy seasons if/when such data are available. Through the degree of increases in the node purity in the RF-based modeling, the authors identified 1–3 days lags in
at the nearby embankment, 1-day lag in
P and in
at the drainage pump stations were the most critical features in predicting
at the wetland. Alternatively, RF-SHAP (a XAI model) could have been used for the feature importance ranking.
In addition to the use of individual AI models, multilayer pattern recognition tools based on multiple supervised AI models have been developed to construct predictive maps based on point-source observations. As such, MLMapper is a AI-based predictive map development tool that performs predictive analyses using 20 different AI models (including IAI and non-IAI models) and site-specific predictors. MLMapper was used to delineate the surface area of groundwater-dependent ecologically-sensitive wetland areas in central Spain, using information on geologic (
), hydrologic (
,
), topographic (
,
), and soil (
,
,
,
) features [
93]. The data size, however, was low for a typical AI modeling, which consisted of 75 known wetland points and 75 non-wetland points. The authors varied the split ratio for the training and testing data from 50:50 to 80:20. They concluded that tree-based models (ERT, RF) outperformed most other supervised classifiers in terms of raw test score, surface area, and number of explanatory variables required for mapping. Trained AI models predicted larger wetland surface areas than the natural inventory, suggesting that a combination of the features identified additional wetland areas not captured in field surveys. Although MLMapper reportedly performs a collinearity test to identify and eliminate redundant features, this is not a requirement for tree-based IAI models. Weighing and permutation importance methods used with the ERT and RF revealed that
,
,
, and
were the most influential features in determining the spatial extent of wetlands. However, ERT-SHAP or RF-SHAP (XAI models) could have been used instead to rank the most influential features without resorting to the recursive feature elimination methods implemented with MLMapper. Moreover, local SHAP analysis could have been used to determine the inflection points above or below which the predicted wetland surface area (represented as a binary variable) may increase or decrease with changes in predictors’ values. Therefore, we expect to see the use of the local and global SHAP analyses in such automated AI-based predictive map construction tools in the near future.
4.2.8. Water Quality Predictions
Prediction of salinity and pollution levels of surface water and groundwater, and identification of the most critical physicochemical parameters affecting local and regional water quality are imperative for their sustainable operations and well-being of aquatic ecology [
94]. Predictors used for IAI-based water quality predictions in previous studies are summarized in
Table 6.
The water quality index (WQI), which integrates several physical and chemical factors into a single parameter, has been commonly used to evaluate or categorize the quality of groundwater and surface waters [
95]. The predictive performance of the optimized RF, XGBoost, ANN and DL models was analyzed in determining entropy weight-based groundwater quality index (EWQI) in the Mahanadi basin in India from a set of physicochemical parameters, involving pH, TDS, TH, Ca
2+, Mg
2+, Na
+, K
+,
,
,
,
,
, and
[
96]. The authors applied the AI models with the data from 226 locations. They varied the split ratio for the training and test data from 75:25 to 85:15 to seek the best prediction accuracy. Although the authors noted that data normalization should be performed prior to these AI models, data normalization is not required for RF and XGBoost. The predictive performance of the AI models were reported to be in the order of DL > XGBoost > ANN > RF, in which DL (a non-IAI model) yielded better predictive accuracy than XGBoost (an IAI model), yet it was unable to unveil the reasoning behind the predictions. Therefore, the authors resorted to inter-criteria correlation to determine the order of importance of the predictors. However, this could have been accomplished by XGBoost-SHAP (a XAI model), which can also provide inflection point values for each predictor above or below which EWQI would increase or decrease.
Groundwater salinity is a critical water quality measure that could affect sustainable use of inland or coastal aquifers. The optimized XGBoost, multiple linear regression (MLR), and DNN models were used to map spatial distribution of groundwater salinity, described in terms of
, in a coastal aquifer of the Caspian Sea in Iran using data from 140 piezometric wells [
97]. The authors used a 75:25 split ratio for the training and test data. Hydrogeologic (
,
), site-specific (
), meteorologic (mean annual
P), hydro-climatic (
), topographic (
,
) features were initially considered as the predictors. The authors used the MLR model to identify the contribution of each predictor to
in a stepwise manner by adding and removing each predictor to MLR until they reached the maximum predictive accuracy on the test data. Based on the MLR analysis,
and
were removed from the predictors list due to their negligible contributions to
. However, use of a linear model to rank the importance of the predictors and remove the least important ones from the AI models for a nonlinear problem is questionable. XGBoost-SHAP (a XAI model) would have been an accurate and robust choice to rank the importance of the predictors for such nonlinear problems. Nonetheless, their analysis unveiled that the predictive performance of the AI models was in the order of XGBoost>DNN>MLR on the test data, indicating that XGBoost (an IAI model) exhibited higher prediction accuracy than DNN (a non-IAI model) in groundwater salinity predictions in a coastal aquifer.
The optimized CatBoost, XGBoost, LGBoost, and RF models were used to predict groundwater salinity in a multilayer coastal aquifer over an area of 3312 km
2 in the Mekong Delta, Vietnam [
98]. Using 216 groundwater samples taken in rainy and dry seasons from 2013 and 2018 with the influencing factors, including site-specific (
,
,
) and hydrogeologic (
,
,
,
,
,
,
,
,
,
) features, and assigning 70% of the data for model training, they concluded that the predictive accuracy of the AI models was in the order of CatBoost > XGBoost > RF > LGBoost. Importance of the predictors was determined using the CatBoost ranking. As a result,
,
, and
were removed from the predictors list. The reduced input set enhanced the prediction accuracy of CatBoost slightly. Although the authors normalized the predictors prior to AI analysis, such normalization is not required for these IAI models. Using the CatBoost model, the authors constructed a regional groundwater salinity map based on the predicted chloride concentrations, which unveiled that paleo-saline groundwater salinization is the main process for increased salinity in the study area and identified salinity-affected populations. Thus, the IAI modeling in this study raveled information not only about salinity intrusion mechanism, but also on its social dimension.
Vulnerability maps have been used to identify areas most vulnerable to water quality deterioration. Index-based techniques have been widely used for preparation of groundwater vulnerability assessments maps due to their computational simplicity and less data demand compared to statistical or process-based simulation techniques [
99]. GALDIT is an index-based method to assess groundwater vulnerability to saltwater intrusion using information on hydrogeologic (aquifer type,
,
), site-specific (
) features, and impact of existing seawater intrusion status. The main drawback of such index-based methods is the subjectivity of each variable’s rating and weight in estimating the vulnerability index. The optimized XGBoost, LGBoost, Adaptive Boosting of Decision Trees (AdaBoost), CatBoost, and RF were used to overcome the subjectivity of the weights and ratings assigned to the variables in the GALDIT framework when calculating a groundwater vulnerability index over 500 km
2 area in the Lake Urmia catchment area in Iran [
100]. The authors implemented boosting aggregation (bagging) and disjoint aggregation (dagging) sampling methods to increase the data size and reduce the prediction variance in this study area with the initially small data size. The GALDIT indices, after being adjusted using TDS measurements, were used as the predictand. Using 70% of the data to train the AI models, the authors concluded that the precision accuracy of the models was in the order of XGBoost > AdaBoost > RF > CatBoost > LGBoost when a bagging or dagging resampling method was not implemented. Although these IAI models improved the prediction accuracy of groundwater vulnerability by ∼15% in comparison to standard GALDIT framework, and additional precision enhancement of ∼5% was achieved using bagging-XGBoost, the final prediction accuracy was however not statistically significant. The authors noted that the six predictors implemented in the GALDIT framework may not be sufficient to determine groundwater vulnerability. They also noted that the AI models chosen in their study cannot suggest a new weight or rating score for each variable because the IAI models are ‘black box’ models. This statement is, however, questionable. Although ensembling can make AI model interpretability and explainability harder, the boosting and bagging AI models are not black-box models, as they can readily be coupled with the explanatory methods for enhanced interpretability (IAI models) and explanability (XAI models), as discussed in
Section 2. In fact, information on new weights and ratings to enhance the groundwater vulnerability index can be obtained by coupling the bagging and boosting AI models with the SHAP method, as in [
18].
Chemicals used on farmlands pose risks on the water quality and human health, and environmental and ecological well-being. The optimized XGBoost, ANN, and SVM were implemented to predict nitrate and pesticide concentrations in groundwater (regression problem) and associated risk (classification problem) using hydrogeologic features (e.g., aquifer type and properties), land use features (e.g., nearby croplands, forest areas, primary water uses), and water quality measures (
,
), and other physicochemical parameters [
33]. The analysis was conducted in a data-scarce region, involving 303 sampling wells from 12 midcontinental states in the USA, and 80% of the data was used for model training. The authors analyzed imbalanced classes using ‘confusion matrices’ in classification problems and implemented oversampling and cost-sensitive learning to address the problem of imbalanced classes. They noted that ANN performed better than XGBoost for the regression task, but XGBoost produced a majority of the best predictions among all three models. In addition, unlike the ANN and SVM models, XGBoost-SHAP (a XAI model) identified the order of importance of the predictors influencing the nitrate and pesticide concentrations and associated risk classifications and concluded that both nitrate and pesticide were the most important predictors of each other. In another study, the RF and MLR models were used to explain groundwater
contamination at the African continent-scale in relation to land use, soil type, hydrogeology (aquifer type,
K,
,
), topography, climatology (climate and rainfall class), nitrogen fertilizer application rate, and
in the absence of a systematic groundwater monitoring program [
101]. The analysis focused on spatially-variant mean
without addressing their temporal variability. Using 80% of the data for model training, the authors concluded that RF outperformed MLR in predicting
. The main advantage of RF (an IAI model) over MLR (a statistical model) is that RF is (i) a non-parametric model, i.e., the model structure does not need to be specified a priori, (ii) more efficient in determining nonlinear relationships and patterns between target and multidimensional predictors without relying on restrictive assumptions such as particular statistical distribution for residuals, non-collinearity among the predictors, (iii) as interpretable as MLR yet provide better predictive accuracy, and (iv) a robust model for outliers. These advantages are equally applicable to other tree-based ensemble AI models.
As mentioned above, WQI has also been used to assess water quality in stream waters. The ERT, DT, and SVM models were used to predict WQI at the Lam Tsuen River in Hong Kong from a set of physicochemical features [
102]. Monthly physicochemical features included
,
,
,
,
,
,
,
,
,
from 1998 to 2017. The author noted that when these 10 features were used as the predictors, the prediction performance of the AI models was in the order of ERT > SVM > DT. By trying different combination of the predictors, ERT (an IAI model) with the reduced list of predictors, including only
,
, and
achieved the second best prediction accuracy. Thus, in the absence of a full set of physicochemical data, ERT with
,
, and
could still provide a good estimate for WQI for surface waters. However, instead of manually trying different (and gradually reduced) combinations of predictors in search of high WQI precision, ERT-SHAP (a XIA model) can be used to identify high fidelity ERT models with the least number of predictors.
Hybrid physics-based and AI models have also been used to predict water quality measures. The XGBoost model was combined with the Soil and Water Assessment Tool (SWAT) [
103] to estimate TDS and better understand water salinity river in a semi-arid agricultural Rio Grande Watershed in Texas [
104]. XGBoost was trained with water quantity and quality data that were monitored in nine locations. The predictors used in their study were physicochemical (
,
,
), meteorologic (
P), topographic (
), and hydrologic (
,
, dominant
). Results from calibrated the SWAT model were used as inputs to XGBoost to predict TDS. However, the SWAT model could not be properly calibrated for all studied locations due to a lack of data. In addition, the insufficient data compromised XGBoost training and caused overfitting. These conclusions highlight the importance of high-quality and sufficient data for proper analysis with AI. The authors also argued that if additional water quality parameters are monitored, more predictors could be used and the results would be more accurate. Despite the insufficient data, the AI modeling approach showed to be advantageous over simple SWAT modeling as it improved the bias and variance of TDS estimates.
In addition to physicochemical parameters, water surface temperature (
) is an influential factor for water ecosystems and, hence, for successful water management plans. The performance of five AI models were compared to predict
of 25 lakes in Poland [
105]. The analyzed models were ERT, multivariate adaptive regression splines (MARS), M5 Model tree (M5Tree), RF, and MLP. Although AI models have been successfully used in broad hydroclimatic applications, none of the AI models in [
105] were able to outperform prediction accuracy of the physics-based ‘air2stream’ model [
106]. The authors suggested including more predictors to potentially improve the prediction accuracy of the AI Models.
As for potential future directions, IAI and XAI can be used to examine how
,
, total and reactive iron (
), redox potential, and sulfate (
) and associated biogeochemical processes [
107,
108] in freshwater environments could vary with the depth in response to changing hydroclimatic conditions under future climates. This could be useful to predict the depths at which aerobic and anaerobic processes prevail, which would have direct impacts on future aquatic ecology and consumptive water use. In addition, infiltration of micro and nanoplastics into freshwater environments is becoming a growing concern worldwide [
109,
110]. When more regional and global data become available, IAI and XAI models could be useful to analyze the relative importance, interdependency, and interaction of environmental factors (e.g., minerals, pH, natural and dissolved organic matter, ionic strength, net surface charge of plastics [
111]) on the the fate and transport of micro and nonoplastics in aquatic environments and consequently their ecological impacts under different hydroclimatic conditions.
4.2.9. Flood Hazard Risks Prediction
Floods are caused by heavy rainfall over lowlands with gentle slope and low water infiltration capacities that can be accompanied by debris flow and landslides. Floods often cause many casualties and property losses. Such extreme events are expected to occur at higher frequencies in a globally warming climate and due to intensified human activities [
112]. Flood risk assessments are important for flood insurance, floodplain management, and disaster warning systems. AI-based flood predictions and risk assessments so far typically focus on passive predictions without considering adaptation measures and resilience of social and economic dimensions. The predictors used in recent IAI-based flood forecast analysis are summarized in
Table 7.
Hydrodynamic models are commonly used for the flood managements. These models solve complex physical equations to estimate floodplains, which makes them computationally inefficient, especially two-dimensional (2D) models. This drawback prevents the application of such models to a large-scale domain, and AI can be an alternative. For example, the RF and MLP models were combined for fast water depths predictions [
113]. RF was applied to identify wet (flooded) and dry cells using flow and the domain coordinates as inputs. Then, MLP used RF’s output to compute river depths in the wet nodes. The authors used the International River Interface Cooperative software (iRIC) model with FaSTMECH (Flow and Sediment Transport with Morphological Evolution of Channel) solver [
114] for hydrodynamic modeling, which was calibrated and used to train the AI models. Seven events with different flow magnitudes (10, 50, 95, 120,150, 300, and 400 m
3/s) were used for training and five events with different flow magnitudes (20, 30, 45, 225, and 350 m
3/s) were used for testing. This approach was evaluated in Green River in Utah, USA and was able to reduce the simulation time by 60 times with satisfactory prediction performance. However, the method was tested for a single location, and its prediction capabilities to other reaches still need to be evaluated. Generalization to different areas is essential for the applicability of such models to large-scale domains.
The performance of RF to predict runoff discharge was compared against the ‘hydromad’ hydrological model [
115] for 95 basins in the USA and Canada [
116]. In this study,
P,
,
and
were used as the predictors. In addition, the effects of catchment characteristics were also evaluated by including additional predictor variables, such as the standard deviations of
P,
and
within the catchments and
. Their results showed that climate conditions and elevation could affect the RF performance. Although the authors noted that RF can be an alternative to traditional hydrological models, they highlighted that RF failed to predict high magnitude flows. In addition, RF only provided robust results for catchments with a warmer climate and lower altitudes. Further research was suggested to increase its accuracy for larger magnitude events and to improve RF prediction capabilities in more heterogeneous catchments. In colder catchments, for instance, the authors suggested including snow and soil moisture as predictors. In semi-arid regions, lack of flood training data compromised the model performance. Their results shows this type of AI is suitable for use in large-scale basins and can improve flood risk assessments at a national or continental scale.
In some other applications, data-driven AI models were used as a sole predictor for flood risk. Current and future flood risk in the Kalvan watershed in Iran was evaluated with AI [
117]. The future conditions were evaluated for 2050, with the projected changes in climate and land use. The authors used conditional inference random forest (CIRF), GBoost, and XGBoost to model the flood risk. In addition, a combined prediction with these three approaches was also evaluated. Twenty predictors were used to build the models, including those associated with the disaster-inducing factors (annual
P,
,
,
,
,
,
,
) and disaster-breeding environmental factors (
,
,
,
,
,
,
,
,
,
,
,
). The results indicated that the combined approach had the highest accuracy, followed by GBM, XGBoost, and CIRF. In general, all models attained a satisfactory performance and are suitable for flood risk mapping. Similarly, LGBoost and CatBoost were used to determine flash flood susceptibility and compared their performance with RF [
118]. The authors used over 400 flood maps to train and test the models, split in 70% for training and 30% to testing. A total of 14 controlling factors were selected, which included those associated with the disaster-inducing factors (
P) and disaster-breeding environmental factors (
,
,
,
,
,
,
,
,
,
,
,
,
). All three IAI models attained accurate results to generate flash flood susceptibility maps. However, LGBoost outperformed RF and CatBoost. In a similar work, 13 controlling factors, including a disaster-inducing factor (
P) and disaster-breeding environmental factors (
,
,
,
,
,
,
,
,
,
,
,
) were used to identify areas prone to flash flooding using ERT and different variants of RF [
119]. Using 256 flood susceptibility points and 256 randomly chosen points in a watershed, and allocating 70% of the data to model training, the authors concluded that ERT showed better prediction accuracy than RF. Although the authors performed collinearity analysis to determine linear dependency among the predictors to avoid redundancy, such analysis is not required for ERT and RF. The authors concluded that topographical and hydrological features are the most critical features in flood flash predictions. Such feature importance analyses can alternatively be performed using SHAP analysis, which can also unfold interrelations and interdependencies among the predictors.
AI models have also been used for regional-scale flood hazard risks. The RF model was used for regional-scale categorical flood hazard risk assessments over 27,363 km
2 with 5000 sample points in the Dongjiang River Basin in China [
120]. The predictors included disaster-inducing factors (
,
,
) and disaster-breeding environmental factors (
,
,
,
,
,
,
,
). The authors considered four risk levels, including highest (with the shortest recurrence interval), high, low, and lowest (with the longest recurrence interval) based on historical flood data. They compared predictive accuracy of RF against SVM and noted that both models identified regions with different flood risks reasonably well. Moreover, based on the Gini index,
,
,
,
, and
were the most critical factors to assess flood hazard risks. Similarly, the optimized GBoost, XGBoost, RF, SVM, MLP, and Convolutional Neural Network (CNN) were used to develop a flood risk map to identify regions with low, moderate, high, and highest risk in the Pearl River Delta in China, based on information obtained from flood risk inventory maps [
121]. Different from [
120,
121] also included disaster-bearing body factors in the AI-based decisions. Using GBoost, XGBoost, RF, SVM, MLP, and CNN, the authors evaluated flood risk using disaster-inducing factors (
,
,
,
), disaster-breeding environmental factors (
,
,
,
,
,
), and disaster-bearing body factors (
,
). They used the split ratio of 70:30 for the training and test datasets. Predictive accuracy of the AI models was reported to be in the order of GBoost > XGBoost > RF∼CNN > MLP > SVM, in which, flood risk prediction accuracy of GBoost, XGBoost, and RF (IAI models) outperformed CNN, MLP, and SVM (non-IAI models). Based on the Gini index analysis of the GBoost predictions, the authors concluded that
,
,
,
, and
were the most critical predictors in the order of importance for flood risk assessments. Validation of these findings and their extension to urban, rural, and coastal areas under different climate zones using XAI models using SHAP analysis are worth investigating further in follow-up studies. As for the flash flood risk assessments, the XGBoost and Least Square Support Vector Machine (LLSVM) models were used to develop flash flood risk maps for the 390,000 km
2 study area in China [
122]. The authors assessed the flood risks based on information on disaster-inducing factors (annual M3HP and M31D, annual
P), disaster-breeding environmental factors (
,
,
,
,
,
,
), disaster-bearing body factors (
,
), and flash flood prediction efforts. Their training data included both flash-flooded and randomly selected non-flooded sites, and allocated 70% of the data for model training. They concluded that XGBoost (an IAI model) outperformed LLSVM (a non-IAI model) in predicting the flash flood risk. Although the authors noted that XGBoost cannot provide factor importance analysis after model development, XGBoost can indeed perform such analysis when it is coupled with the SHAP method, as demonstrated in [
17,
18].
As originally noted in [
120], neither of these AI-based flood prediction models can address the influence of hydraulic mitigation structures (e.g., dikes, levees, reservoirs) that play an important role in flood control and reduce the associated risk. Interventional AI modeling could be the proper method for such analysis in the near future.
4.2.10. Drought Predictions
Integration of drought predictions into societal decision-making processes are critical for sustainable and climate-resilience water, irrigation, and ecohydrology managements [
123]. Predictors used for IAI-based drought predictions in recent studies are summarized in
Table 8.
The performance of optimized Decision Trees, AdaBoost, RF, and ERT was compared against the MLR in predicting hydrological droughts in ungauged areas in two watersheds in South Korea using remotely sensed data from six other watersheds [
124]. The authors used 16 years of monthly data acquired multiple locations from 2002 to 2017 and allocated ∼70% of the data to model training. Drought severity was expressed at the 3-, 6-, 9-, and 12-month time scales in terms of monthly streamflow percentiles and related to meteorologic (monthly
P) and hydroclimatic and soil-associated (
,
,
,
) factors, in addition to the month of the year. The study concluded that AdaBoost (with the best prediction accuracy), RF, and ERT (IAI models) successfully detected observed hydrological droughts. The authors used permutation importance scores to identify the order of importance of the predictors. The analysis revealed that
P, followed by
(at the 3-month time scale) or
(at longer time scales) are the most critical predictors in forecasting hydrological droughts. As the authors noted, this IAI framework can be used to predict hydrological droughts in ungauged watersheds, if the ungauged basin characteristics are similar to gauged basins used in model training, suggesting that such applications require a priori domain knowledge.
The XGBoost and ANN models were used for drought forecasts based on the Standardized Precipitation Evapotranspiration Index (SPEI) 1–6 months in advance. The authors used AI models to predict SPEI in a study area in the northwest part of China from monthly-averaged meteorological and climatic variables, their lagged relationships including SPEI, and month of the year [
125]. The meteorological variables included
,
,
,
,
,
,
P, and sunshine duration using data from 32 stations during 1961 to 2016. They computed the
through the PME. Climate predictors involved
,
,
,
,
, and
. They used sunshine duration as a surrogate variable for
, as
measurements were not available at the stations. They concluded that XGBoost (an IAI model) outperformed ANN (a non-IAI model) for overall droughts and drought categories. The author used a distributed lag nonlinear model to select the optimal predictors and their lag time; however, they did not disclose the order of importance of the predictors and their dependency relations, which could have been revealed by XGBoost-SHAP (a XAI model). The authors used linear booster with the XGBoost model, and noted that prediction accuracy could have improved if tree-booster was implemented instead.
Different from index-based drought predictions, the performance of the optimized RF, DT, and LSTM models were compared in predicting
,
,
, and
in low flow periods, corresponding to drought events, as well as for the entire monitoring period across the Netherlands [
126]. The predictors included daily
P,
,
associated with the main rivers feeding the river system of the country, sea level, and their first three lags with or without water management decisions during previous droughts, accounted for by reconstructed historical
of the main water infrastructures. Using 60% of the data acquired from ∼4000 stations between 1980 and 2019 for model training, RF provided the best overall accuracy. The AI models reportedly resulted in acceptable predictions for
,
, and
, but relatively less prediction accuracy for
. Although predictors associated with the water management decisions did not improve prediction accuracy more than 9%, they appeared to be critical features at some locations. The authors tried to predict
and
in low flow periods, at which RF and LSTM performed better, yet the predicted
and
were 15–20% and 5–12% lower than observed values and did not reliably capture the prolonged 2018 drought. Although the authors called the AI models in their study the black-box models, the DT and RF models are not black-box models [
10], as these model are amenable to coupled with the SHAP and LIME methods (forming XAI models) that can unveil the interpretable relationships between predictors and predictand, explainable model decisions, and seek new knowledge, as discussed in
Section 2. Moreover, the authors used model coefficients from statistical models (e.g., LASSO) to determine which predictors have an inverse relationship with the predictors. However, such information can be readily and accurately be obtained using RF-SHAP (a XAI model) without resorting to statistical models [
17,
18].
4.2.11. Climate Change Impacts Modeling
Global circulation models (GCMs) that simulate physical processes in the atmosphere, ocean, cryosphere, and land surface are the primary tools to generate climate forecasts. When compared with surface observations, these models, however, suffer from biases and are unable to provide ready-to-use information at the regional spatial scales. Therefore, downscaling methods are commonly used to link the coarse-resolution global simulated predictors to the local observed predictand over the area of interest [
127]. IAI and XAI models have been recently used to develop procedures for multi-model ensemble climate simulations and forecasting hydroclimatic variables under future climate scenarios.
An optimized RF model was used to develop a procedure for multi-model ensemble climate simulations from 24 Coupled Model Intercomparison Project Phase 6 (CMIP6) models to capture the characteristics of the spatially varying observed climatic data across China [
128]. Each CMIP6 model was treated as a feature in the RF framework. The split ratio for the training and testing data was ∼60:40, and the length of the training data was 31,552. The predictors of the IAI model included
, annual
, annual
, total
P in wet days, annual maximum consecutive 5-day
P amount, and annual total
P for events exceeding the 95th percentile. The authors reported that RF exhibited higher predictive accuracy than LR and simple arithmetic mean. They subsequently used the trained RF model to predict the regional projection of future climate for 1.5 °C, 2 °C and 3 °C global warming targets, relative to preindustrial levels, under the SSP5 emission scenario. SSP5 is the worst-case climate scenario, in which the future presumably heavily relies on intense use of fossil fuels without implementing sound adaptation and mitigation strategies. Although CMIP6 models were used as features in their RF model, the relative conformity (ranking) of the CMIP6 models to the observed data was not disclosed. This could have been effectively implemented with an RF-SHAP approach (an XAI model). We expect that the order of conformity of the CMIP6 models would vary with geographic regions and climatic zones. Therefore, it would be useful to know which CMIP6 models would be more representative for certain geographic regions and climatic zones across the globe.
As for predicting hydroclimatic variables under potential future climates, the optimized RF model was used to predict potential changes in water regime types in the northwest of the European part of Russia for the period of 2087–2099 using projected monthly runoff data from GCMs [
129]. The authors divided the study area into uniform grids with the spatial resolution of 0.5° × 0.5°. They reanalyzed and computed historical monthly runoffs using the GR4J hydrological model [
130] at each grid cell, which furnish the predictors for the IAI model. The RF model was trained using historical data, including the categorical water regime types as the predicant and monthly runoffs as the predictors. The authors used four GCMs, including GFDL-ESM2M, HadGEM2-ES, IPSL-CM5A-LR, and MIROC5, with three representative concentration pathways (RCP)- RCP 2.6, RCP 6.0, and RCP 8.5, to estimate future projected monthly runoffs. Here, RCP 2.6 represents the future with widely used renewable green energy, while RCP 8.5 represents the future with intense uses of oil and gas for energy production. RF was used to predict the spatial distribution of water regime types across the study area using monthly runoff computed by the GR4J model using projected climate data from the GCM models under different RCP scenarios. The analysis suggested that water regimes types could alter over 73.6% and 99% of the study area under the RCP 2.6 and RCP 8.5 scenarios, respectively during the 2087–2099 period. Moreover, the summer and winter flows could be less stable and spring flow peaks could be lower while shifting to earlier times. Although the authors used historical and projected climate data in calculating monthly runoff using the hydrological model, climate variables could have been also used as predictors in the IAI model. In this case, interdependencies and the importance of the predictors as well as their critical values responsible for changes in water regime types could have been determined by using the RF-SHAP model (i.e., XAI model).
Moreover, a novel optimized XGBoost-based XAI framework to predict long-term
and decadal hydrological droughts in an ecologically fragile groundwater-dependent semi-arid region in south-central Texas, USA under projected future climate scenarios from 2021 to 2100 was presented in [
20]. The severity of future hydrological droughts was assessed based on mandated groundwater pumping reductions, if the tiered critical period management pumping restriction plan as part of the current habitat conservation measures at the site, would have been implemented during the seven years-long worst drought that the study region experienced in 1950s. Groundwater pumping reductions in this plan hinge on
at an index well. The authors set-up the XAI model first to predict weekly
from a set of weekly features, including historical lagged
, lagged and current
P, and current
and
. They used the recorded weekly climate data from 1950 to 2005 to train the XGBoost model. When combined with the SHAP method, the trained XGBoost model revealed that the first lag of
and
P, in addition to
were the most decisive features to predict
. The trained XAI predicted
from 2006 to 2020 with high accuracy when historical climate data or Coupled Model Intercomparison Project Phase 5 (CMIP5) data under the RCP 4.5 and 8.5 scenarios were used as input. In their study, CMIP5 data were downscaled using the Multivariate Adaptive Constructed Analogs (MACA) [
131]. Subsequently, the validated XGBoost model was used with the CMIP5-MACA projected
and
P to forecast weekly
and decadal hydrological droughts from 2021 to 2100 under the RCP 4.5 and 8.5 scenarios. The XAI model additionally revealed that despite an increasing precipitation trend, compound effects of increased evapotranspiration, lower soil moisture, and reduced diffuse recharge due to warmer temperatures could amplify severe hydrological droughts that lower groundwater levels, if regional-scale climate adaptation and mitigation strategies are not implemented.