A Modeling Comparison of Groundwater Potential Mapping in a Mountain Bedrock Aquifer: QUEST, GARP, and RF Models

: In some arid regions, groundwater is the only source of water for human needs, so understanding groundwater potential is essential to ensure its sustainable use. In this study, three machine learning models (Genetic Algorithm for Rule-Set Production (GARP), Quick Unbiased Efficient Statistical Tree (QUEST), and Random Forest (RF)) were applied and verified for spatial prediction of groundwater in a mountain bedrock aquifer in Piranshahr Watershed, Iran. A spring location dataset consisting of 141 springs was prepared by field surveys, and from this three different sample datasets (S1–S3) were randomly generated (70% for training and 30% for validation). A total of 10 groundwater conditioning factors were prepared for modeling, namely slope percent, relative slope position (RSP), plan curvature, altitude, drainage density, slope aspect, topographic wetness index (TWI), terrain ruggedness index (TRI), land use, and lithology. The area under the receiver operating characteristic curve (AUC) and true skill statistic (TSS) were used to evaluate the accuracy of models. The results indicated that all models had excellent goodness-of-fit and predictive performance, but that RF (AUC mean = 0.995, TSS mean = 0.89) and GARP (AUC mean = 0.957, TSS mean = 0.82) outperformed QUEST (AUC mean = 0.949, TSS mean = 0.74). In robustness analysis, RF was slightly more sensitive than GARP and QUEST, making it necessary to consider several random partitioning options for preparing training and validation groups. The outcomes of this study can be useful in sustainable management of groundwater resources in the study region.


Introduction
Groundwater is the most vital natural resource in many arid and semi-arid regions, where it is often the only source of water for human needs. Groundwater resources are related to some important social issues such as environment, resources, and population. Population growth leads to increasing demand for water in the domestic, industrial, and agricultural sectors. In both rural and urban areas in arid and semi-arid regions, almost 90% of all water used primarily derives from groundwater [1]. As the demand for groundwater increases around the world, groundwater potential mapping is becoming an essential tool in enabling successful groundwater management schemes, determination, and safety [2].
The methods used for groundwater potential mapping can be grouped into two main categories: knowledge-driven techniques based on the conceptual component and data-driven techniques based on the empirical component of groundwater evaluations. The knowledge-driven techniques, e.g., analytical hierarchy process (AHP), are based on expert experience [3,4] and are thus affected by the knowledge and subjectivity of experts, while they are also generally site-specific. The data-driven techniques involve probabilistic, statistical, and data mining approaches, and the quality and quantity of data are the major characteristics affecting the predictive accuracy. In the literature, various types of probabilistic and statistical techniques utilizing bivariate and multivariate methods have been employed for preparing groundwater potential zone maps such as frequency ratio [5,6], the weight-of-evidence method [7], Dempster-Shafer theory [8,9], logistic regression [10,11], evidential belief function [12,13], certainty factor [14,15], statistical index [9], and index of entropy [16]. In recent years, machine learning techniques (MLTs) have been applied for mapping of groundwater potential. Various MLTs, such as aquifer sustainability factor [17], classification and regression tree [18], random forest [19][20][21], boosted regression tree [22], maximum entropy [19], artificial neural network model [23], and generalized additive model [2], have been found to be helpful for modeling groundwater potential. In previous studies, MLTs have provided better accuracy in many situations due to their ability to handle, in a robust manner, data characterized by a non-linear format, representing different scales, and deriving from different sources [24][25][26]. Recently, Chen et al. [27] devised methodology for preparing groundwater potential maps that involves kernel logistic regression, random forest, and alternating decision tree models. The results of their research indicated that the artificial intelligence technique can be considered an accurate and valid approach for groundwater potential mapping. Bui et al. [28] proposed a hybrid computational intelligence technique, "AB-ADTree", which is a combination of alternating decision tree classifier and AdaBoost ensemble, for groundwater potential mapping and compared it with other models, including single ADTree, support vector machine, stochastic gradient descent, logistic model tree, logistic regression, and random forest (RF). The results revealed that the hybrid AB-ADTree was a robust and powerful model for groundwater potential mapping in the study area, and significantly improved the predictive capability of the ADTree-based classifier [28]. Davoudi Moghaddam et al. [29] evaluated the effect of sample size on the accuracy of various individual and hybrid techniques, including adaptive neuro-fuzzy inference system, ANFIS-imperial competitive algorithm, alternating decision tree, and RF, to model groundwater potential for a number of springs ranging from 177 to 714. The outcomes of their study provide valuable reference and practical guidelines for choosing machine learning models when a complete spring inventory in a watershed is unavailable.
The aim of the present study was to delineate potential zones for groundwater springs by employing Genetic Algorithm for Rule-Set Production (GARP), Quick Unbiased Efficient Statistical Tree (QUEST), and Random Forest (RF), three state-of-the-art machine-learning models. The GARP modeling technique uses a training and testing process to analyze the relationships between environmental variables and spatial data. A benefit of the GARP model compared with other models is that it allows various outputs to be run to reach an optimum result [30]. On the other hand, the QUEST model has many benefits over other algorithms, as it is a fast, efficient, and impartial statistical tree, it employs an unbiased or linear variable selection model, and it applies imputation instead of substitute splits to handle missing values [31]. The RF model, a modern nonparametric method, has proven to be one of the best learning techniques in recent years. The RF model can be applied for a large dataset and can make precise classifications [32].
All three models can be used for accurate identification of complex relationships between variables in a dataset [33]. They were used here to analyze the relationship between groundwater occurrence and hydrogeological and geological parameters that act as conditioning factors for groundwater occurrence. The novelty of this research and its contribution to the literature is that it describes, for the first time, application of the GARP and QUEST models in groundwater research in a mountain bedrock aquifer (north-western Iran). The results can be useful in future natural resources planning and management. Specific objectives of the study were to: (i) spatially predict groundwater potential by applying the GARP, QUEST, and RF models; (ii) assess the importance of selected geoenvironmental spring-affecting factors in modeling; (iii) compare the predictive performance of these techniques by means of cutoff-dependent and cutoff-independent indices, and (iv) conduct robustness analysis.

Description of the Study Area
The study area was Piranshahr Watershed (36°39′-36°56′ N; 44°49′-45°11′ E) in Western Azerbaijan province, Iran, which is characterized by mountainous morphology and occupies an area of approximately 42,617 ha ( Figure 1). The topographical elevation of the watershed varies between 1433 and 3563 m above mean sea level. Mean annual precipitation is 650 mm and most falls in the form of snow in winter. According to the Emberger classification, Piranshahr Watershed is characterized by a cold semi-arid climate. Rangeland is the major land use in the watershed, occupying 68.5% of total area. The lithology of the watershed varies, with 45% falling within the Precambrian units and Late Cretaceous units, including amphibolite and diabase, respectively (Table  1) [34]. Evaluation of groundwater is vital within this area, because groundwater supplies drinking water and irrigation requirements, and many people living in the area depend on irrigated and dryland farming for their livelihood.

Methodology
In this study, we used groundwater springs as good indications of groundwater availability for groundwater potential mapping. This is because in the study area, as in most other mountainous environments, few hydrogeological and hydrological data (i.e., groundwater levels, well yields, and stream discharge) are available due to a lack of piezometric wells in these high-elevation settings [35]. The methodology used in this research comprised several steps. First, the datasets were prepared for modeling. GARP, QUEST, and RF algorithms were then applied for mapping of groundwater potential. Next, the relative importance of conditioning factors was determined. Then, the area under the ROC curve (AUC), and true skill statistic (TSS) were applied for validation of algorithms in the novel application tested in this study. Finally, robustness analysis was applied to compare the capabilities of the three machine learning models ( Figure 2).

Springs Dataset
The occurrence locations of springs depict groundwater potential across the study area. Hence, a total of 141 spring occurrence locations in Piranshahr Watershed were determined by field surveys and using handheld GPS. For this modeling, the spring inventory locations were randomly divided into two groups, of 70% (99 spring locations) and 30% (42 spring locations), for training and validation steps, respectively. To clarify, the validation group was determined through randomization, and was not applied in model building. It is generally suggested in spatial modeling to apply approximately equal proportions of spring-present (1) and spring-absent (0) pixels [10]. Keeping this in mind, the 141 spring-present pixels and 141 randomly generated spring-absent pixels were applied in the modeling. Note that to investigate the robustness of the models, three training and three validation datasets (S1-S3) were created by performing random selection of the dependent variable (spring occurrence locations) in a GIS environment ( Figure 3) [36,37]. The training datasets included 99 spring occurrence locations, while the validation datasets consisted of 42 spring occurrence locations.

Geo-Environmental Factors
Drawing on a comprehensive literature review, the most representative groundwater conditioning factors were chosen to model groundwater potential. Regarding data availability, 10 groundwater effective factors were chosen for this investigation, comprising slope percent, relative slope position (RSP), plan curvature, altitude, drainage density, slope aspect, topographic wetness index (TWI), terrain ruggedness index (TRI), land use, and lithology.
To evaluate the multicollinearity of groundwater conditioning factors and to avoid bias, we applied the variance inflation factor (VIF) and tolerance (TOL) indices, which are customarily used to estimate multicollinearity of predictive variables in geospatial modeling [38]. According to the literature [38], VIF > 10 or tolerance < 0.1 indicates critical multicollinearity, which itself recommends that the factor has such a strong correlation to others that it should be eliminated from modeling [38].
Then, a vector-type spatial database was extracted by converting groundwater conditioning factors using ArcGIS. A digital elevation model (DEM) with 20 × 20 m grid size was prepared using survey base points, with elevation contours displaying the elevation values extracted from 1:50,000scale topographic maps. The DEM was used as input to extract thematic maps of slope percent, RSP, plan curvature, altitude, slope aspect, TWI, and TRI (Figure 4a-h).
Slope factor has a very significant role in groundwater potential mapping, and therefore a slope percent map of the study area was prepared [29] (Figure 4a). RSP can be applied to identify topographic characteristics such as valleys, ridge tops, flat surfaces, mid-slopes, foot-slopes, and upper slopes, with the value for RSP ranging from 0 (flat surface and valleys) to 1 (upper-slopes and ridge tops) [39]. In this study, a RSP map was prepared using SAGA-GIS ( Figure 4b). Regarding plan curvature, positive curvature indicates convex forms, zero curvature represents flat surfaces and negative curvature represents concave forms. A plan curvature map of the study area was prepared using SAGA-GIS ( Figure 4c). Elevation has important effects on climate conditions in any area, because of differences in vegetation and soil [40]. Accordingly, an altitude map of the study area was prepared ( Figure 4d). Drainage density is calculated as the ratio of the sum of drainage lengths in a grid cell and the area of the corresponding cell, and displays the potential water flow throughout the study area [6]. Here, the drainage density map was calculated considering a 20 × 20 m grid cell ( Figure  4e). Slope aspect in the study area was classified into nine groups, including flat land and eight directions ( Figure 4f).
Topography plays an essential role in the spatial variation of hydrological characteristics such as soil moisture and groundwater flow, and hence secondary topographic indices can be applied to describe spatial soil moisture patterns [41]. Topographic wetness index (TWI) is a subordinate topographic factor within the runoff model, defined as [41]: where ß is the cumulative upslope area draining through a point (per unit contour length) and tan is the slope angle at the point. The tendency of water to accumulate at any point in a catchment (in terms of ß) and the tendency of gravitational forces to move that water downslope (illustrated in terms of tan as an approximate hydraulic gradient) are reflected by the ( tan ) index. Water infiltration mainly depends on material characteristics such as permeability, soil strength, and pore water pressure. Consequently, TWI was considered as a contributing factor in this study ( Figure 4g). Terrain ruggedness index (TRI) is the mean difference between a central pixel and its surrounding cells and was used here to quantify landscape heterogeneities ( Figure 4h). It is defined as [42]: where zc is the elevation of a central cell and zi is the elevation of one of the eight neighboring cells (i = 1, 2, 3, …, 8).
Using Landsat images and the maximum likelihood and supervised classification algorithms, a land use map of the study area with spatial resolution 20 × 20 m was generated. Four main land use categories were identified: agriculture, dryland farming, residential, and rangeland ( Figure 4i). One of the most significant factors in predicting groundwater potential zones is lithology. Using the 1:100,000-scale geological map, a lithology map of the study area was obtained and classified into 10 categories ( Figure 4j, Table 1).
All 10 conditioning factors were converted to a raster grid with 20 × 20 m cells for application of the GARP, QUEST, and RF models.

Description of Models
Genetic Algorithm for Rule-Set Production (GARP) The GARP model is inspired by algorithms of genetic evolution and is a technique that was particularly developed in the context of ecological niche modeling [30]. GARP develops predictive techniques consisting of a set of conditional rules in the form of 'if-then' statements that explain the ecological 'niches' of the species [43]. Niches are recognized by searching iteratively for non-random correlations between known locations of species and a variety of environmental factors. The GARP technique consists of a set of mathematical rules based on environmental conditions. Given a particular environmental condition, if all rules are satisfied, the technique predicts presence of the species. Different types of rules are possible, including atomic, logistic regression, negated bioclimatic envelope, and bioclimatic envelope. All sets of rules are an individual of a population in the representative context of a genetic algorithm [44]. GARP creates an initial population, which is assessed in each iteration to examine if the problem has converged to a solution. Genetic operators (join, cross-over, and mutation) are used in the start of each new iteration on the best individuals from previous steps, generating a new population of solutions. GARP is therefore a non-deterministic technique that generates boolean responses (present/absent) for each various environment condition.
The GARP model was applied here, as DesktopGarp [45], to predict groundwater potential in the study area, as a presence-only modeling tool to analyze the relationship between the groundwater potential dataset and geo-environmental factors through an iterative process, applying conditional rules for modeling that enhanced the stability of model output [46]. DesktopGarp is an integrated, user-friendly software and is able to import GIS layers of groundwater conditioning variables directly, with presence-only occurrence data [47]. This feature makes application of this model easier, since absence or non-occurrence data are not always available, especially in data-scarce areas such as Iran and Middle Eastern countries. However, with GARP, which is claimed to be robust with its default settings, there is a need to be careful with some model parameters, such as the size of quadrates for the background environmental variable layers and the extent of the study area [47]. The only data needed as input for DesktopGarp in the present application were the spring occurrence locations and the environmental datasets formed of groups of environmental factors. Performing multiple runs to achieve various outputs of the GARP model and applying the best-subset method are vital in choosing the best output with optimum parameters. The set of techniques that obtains harmony between omission (sensitivity) and commission (specificity) error thresholds is defined by the user [46,48]. The output of GARP in this case was a collection of grids of the study area, which was applied in a GIS environment to identify areas with groundwater potential [46].

Quick Unbiased Efficient Statistical Tree (QUEST)
The QUEST model is a binary split-decision tree model for data mining and classification that yields subsets of data that are as homogeneous as possible with respect to the response variable [48,49]. QUEST is an efficient, quick, unbiased, and statistical tree-structured classification algorithm that produces a growing binary-split decision tree [50]. In addition, it employs a sequential tree growing technique, which uses a linear discriminate analysis method in splitting of tree nodes. This has many benefits over recursive tree construction techniques such as classification and regression tree (CART) [51]. Furthermore, it is unbiased in selecting splitting rules and does not use an exhaustive variable search routine [52]. By using an unbiased variable selection method in modeling, QUEST has negligible bias [53]. As a result, it can easily cope with categorical and continuous factors [50,54,55]. In other words, the process of tree growing in QUEST consists of choosing a split predictor, choosing a split point for the selected predictor, and stopping. In this technique, only univariate splits are considered. The following steps are involved in a choosing split predictor: First, for each continuous predictor , carry out an ANOVA -test that tests if all the various classes of the dependent variable have the same mean of , and compute the value in accord with the statistics. For each categorical predictor, carry out Pearson's 2 -test of and the independence of , and compute the value in accordance with the 2 statistics. Then, find the predictor with the smallest value and denote it * . If this smallest value is less than / , where ∈ (0, 1) is a user-specified level of significance and is the total number of predictor variables, predictor * is chosen as the split predictor for the node. If not, for each continuous predictor , compute Levene's statistic in accordance with the absolute deviation of from its class mean to test if the variances of for various classes of are the same, and compute the value for the test. Then, find the predictor with the smallest value and denote it * * . Finally, if this smallest value is less than /( + 1), where 1 is the number of continuous predictors, * * is chosen as the split predictor for the node. Otherwise, this node is not split [56].

Random Forest (RF)
The RF model is an ensemble classifier that consists of many decision trees, applying the random forest algorithm of Breiman for classification and regression [57]. When classifying a new object from an input vector, RF runs the input vector down each of the trees in the forest. Each tree gives a classification (which is usually called the tree 'votes' for that class). The forest chooses the classification with the most votes (from all the trees in the forest). The RF model is not sensitive to the problem of multicollinearity; it manages correlated variables well. With large databases, it runs efficiently and it can process thousands of input variables without variable deletion. It prepares estimates of variables that are important in the classification, it is robust with unbalanced datasets and missing data, and it offers an experimental technique for detecting variable interactions [57]. RF requires the modeler to choose two parameters: the number of variables and the number of trees. Furthermore, RF provides two importance ranking indices, "mean decrease Gini" and "mean decrease accuracy" (also termed percent increase in mean square error (MSE)) [58]. The higher the MSE percentage, the greater the importance of the independent variable considered [59]. These indices can be applied to reveal the sensitivity of the model to input variables and to help decrease uncertainty. In this study, this model was used by applying the 'randomForest' package in R. and the default of 500 trees (parameter "ntree") was used as error rate stabilized at 100-150 trees [26]. A random one-third of predictor variables was used to carry out data partitioning at each node (parameter "mtry"). Two-thirds of the overall dataset were randomly chosen and utilized to create the RF model, and the other one-third retained for testing the model.

Accuracy Assessment of Models
The outputs of models (maps) are not reliable without validation [60,61]. In this study, the accuracy and suitability of the models was assessed using both cutoff-dependent and cutoffindependent methods. One of the most widely used metrics of performance is receiver operating characteristics (ROC) curve, which has been used in many hydrological and geo-environmental studies such as susceptibility and potential modeling of flood inundation, landslide, gully erosion, and groundwater [62,63]. The area under the ROC curve (AUC-ROC), a cutoff-independent metric, was constructed in this study using groundwater potential maps and the spring locations allocated to the training dataset [64,65]. Its value ranges from 0 to 1, with higher values indicating better model performance [66,67]. To plot and calculate the area under the ROC curve, the validation dataset was applied.
Among cutoff-dependent criteria, the true skill statistic (TSS) was selected in this study. TSS can be computed as [68]: where TPR describes true positive rate and FPR depicts false positive rate. TPR and FPR can be computed by applying the following equations [69]: where TP = true positives, TN = true negatives, FP = false positives (Error Type I), and FN = false negatives (Error Type II). TP and TN are the numbers of pixels that are correctly grouped, while FP and FN are the numbers of pixels that are erroneously grouped. The value of TSS ranges from −1 to +1, where −1 indicates that predictive performance is not better than that of a random model, 0 indicates an indiscriminate model, and +1 indicates a perfect model [68].
The robustness of a model can be investigated by assessing the change in model efficiency when small changes in input data are made [37,70]. In this study, three datasets were applied for both training and validation steps. The three models were run using the training datasets, and then verified using the validation datasets. "The robustness of the predictive model can be defined as the constancy of the model's prediction accuracy when the training and validation data sets are changed" [71]. The robustness of the models was computed by differentiating the maximum and minimum accuracy values regarding each evaluation criterion [36]: where RTSS, and RAUC are the robustness of a model in terms of TSS and AUC, respectively, TSSmax and AUCmax are the maximum values of accuracy among the three datasets, and TSSmin and AUCmin are the minimum values of accuracy among the three datasets. These analyses were carried out in both training and validation steps.

Multicollinearity Analysis of Predictive Factors
The outputs of multicollinearity analysis revealed that the highest VIF value was 3.547 and the lowest TOL value was 0.238, which shows that there was no multicollinearity among the predictive factors (Table 2). Hence, all the predictive factors were applied in modeling.

Application of the Models for Groundwater Potential Mapping
Figures 5-7 show groundwater potential maps produced by the GARP, QUEST, and RF models, respectively, for the three different training datasets (a-c).   The groundwater potential maps created by sample dataset 3 (because of higher accuracy in validation step) and the GARP (Figure 5c), QUEST (Figure 6c), and RF (Figure 7c) models were then imported into ArcGIS 10.3 software and, using the quantile classification scheme [19,71], categorized into five different classes (very low, low, moderate, high, and very high potential) (Figure 8).  Table 3 shows the watershed area (%) occupied by these five groundwater potential classes determined according to the different models and sampling dataset 3 ( Figure 8).

Accuracy of the Groundwater Potential Maps
The results of goodness-of-fit are shown in Table 4 and Figures  Hence, all three models showed good-excellent performance. In fact, the TSS and AUC-ROC criteria demonstrated substantial agreement between the trained models and reality. However, as the training datasets were applied to generate the models, they could not be used to evaluate their prediction capability. In other words, the goodness-of-fit step only showed how well the models fit the training dataset, and could not be interpreted as indicating the prediction ability of the model [38]. The validation analysis showed how well the models predicted groundwater potential in the study area. In the validation step, the results of the three models were verified by applying validation datasets and the two evaluation criteria (Table 5)

Robustness Analysis
Model robustness results according to the two different evaluation criteria are presented in Figure 9. There were only slight variations in these results following changes in the training and validation samples. Therefore, it can be concluded that all three models were stable and robust in both the training and validation steps. However, as can be seen in Figure 9a, the stability of GARP (RTSS = 0.070) was slightly better than that of QUEST (RTSS = 0.080) and RF (RTSS = 0.120). As shown in Figure 9b, QUEST (RAUC ≥ 0.018) displayed high stability and robustness, while GARP (RAUC = 0.033) and RF (RAUC = 0.053) showed slight asymmetry in their results. In terms of both RTSS and RAUC, RF showed lower stability than the other models.

Importance Analysis of Predictive Factors
The variable importance assessment in this study was conducted using the RF model. As one of its main benefits, RF can estimate the importance of predictor variables applied in the modeling process using the index percentage increase in mean square error (PIMSE) [59,71]. Table 6 shows the importance of predictor variables in predicting groundwater potential. As can be seen, RSP, lithology, and TWI were the three most important variables for spatially predicting groundwater potential, with PIMSE equal to 82.6%, 69.2%, and 55.1%, respectively.

Discussion
Machine learning techniques have been gaining popularity in the field of spatial modeling, especially groundwater potential modeling. Machine learning models show particular promise when dealing with the challenge of groundwater potential mapping in mountainous and large regions, which may not have sufficient hydrogeological and geotechnical data to run numerical and/or physically-based models [72]. Thus new models must be developed and their capability and applicability evaluated, to find the most accurate and robust model with the best possible results from the least amount of input data. However, there are various sources of uncertainty in spatial modeling of groundwater potential, including those related to input variables, sampling strategy, and the modeling process [73]. According to Garosi et al. [74], random classification of datasets for training and validation steps can be a significant cause of uncertainty in groundwater potential modeling using groundwater spring occurrence locations. From the validation findings in this study and based on cutoff-dependent and cutoff-independent evaluation criteria, it is obvious that all three models tested showed good to excellent performance in predicting groundwater potential. Importantly, the results demonstrated that the RF model had the highest TSS and AUC values for all training and validation datasets, even when based on various evaluation criteria, but this model exhibited slightly lower robustness (but acceptable) compared with the other models. The values of AUC for the validation dataset showed that GARP and QUEST achieved excellent and good performance, respectively.
An interesting finding was that, while RF was identified as the best model in terms of accuracy, it had slightly lower robustness for groundwater potential modeling using spring occurrence locations. This is partly consistent with findings by Rahmati et al. [20] and Naghibi et al. [75], who concluded that RF is a powerful predictive model that can run accurately with large databases and can process thousands of input variables without variable deletion. Similarly, Davoudi Moghaddam et al. [29] found that RF prepares estimates of the variables that are vital for classification, is robust with unbalanced datasets and missing data, and offers an experimental technique for detecting variable interactions [57]. In an earlier study in the Bojnourd Watershed, Iran, RF also demonstrated excellent performance in groundwater potential modeling [20] and similar findings have been made in some other areas [75,76]. However, as illustrated by the results in the present study, RF was slightly sensitive to random changes in the input data, although its robustness was still within the acceptable range. Thus while the RF model was found to generate more accurate predictions in the present case, it is important to recognize the main sources of variations that influence its robustness. Furthermore, from Figure 5 and Table 3 it is clear that GARP indicated different repartition of groundwater potential areas than the QUEST and RF models. This can be related to the fact that GARP is a presence-only approach that is appropriate for most cases in which true absence data are lacking. Theoretically, it fits models from background and presence data, whereas other presenceonly models either generate pseudo-absence data by sampling points outside the boundary of presence data, or strictly use presence-only modeling [77,78]. GARP projects the realized niche into a geographical space without giving weight to observed absence data, which could cause a poorer fit to the current observed distribution than presence-absence approaches such as RF and QUEST, which are tree-based models [79]. Therefore, the reason why the outputs of QUEST and RF are more similar is because they belong to the same (tree-based) model group. Such models provide a transparent treelike structure with easily interpreted rules [38]. In addition, they use a type of statistical analysis with no statistical distribution assumption and they can handle data from various scales. Furthermore, they permit identification of homogeneous groups with various potential levels, and facilitate construction of rules for prediction of complex relationships [80]. Overall, according to the groundwater potential maps obtained (Figure 8), central and eastern parts of Piranshahr Watershed have the highest potential for groundwater and should be the subject of further investigation and exploitation.
In addition, credible knowledge of the relationships between predictor variables (e.g., geoenvironmental factors, etc.) and the respective target variable (i.e., groundwater potential) can help decision-makers achieve sustainable groundwater management, but these associations are still somewhat complicated and poorly documented [29]. The results of variable importance analysis using the RF model indicated higher importance of RSP, lithology, TWI, altitude, and drainage density, which shows the significant impact of DEM-derived factors on groundwater potential modeling ( Table 3). Application of RSP is novel in the field of groundwater potential mapping and proved here to be a significant conditioning factor in the modeling process. Higher contributions of the RSP and lithology factors have also been reported in a previous study on groundwater resources [71]. Overall, DEM-derived factors were found to be very important in the present study. This could be related to the impact of topography on flow speed on slopes, and subsequently on permeability and erosion ratio. Therefore, these factors should be prepared with higher accuracy and finer spatial resolution. Focusing on DEM-derived factors might increase the performance of the models. Different groundwater conditioning factors are reported to be important in groundwater potential modeling, but so far there are no comprehensive and universal guidelines on choosing independent variables among geo-environmental factors. For example, Rahmati et al. [8] mention altitude, drainage density, and lithology as the most significant factors, while Naghibi et al. [2] list NDVI, altitude, and slope angle as the most significant conditioning factors. This lack of agreement on the importance of conditioning factors could be associated with variations in climate, geological, and topographical characteristics between study areas. Importance analysis of conditioning factors is useful for multisource groundwater studies and solves the problem of high data dimensionality, by eliminating redundant predictor variables [71]. This can help decision-makers allocate resources to accurate determination of variables that significantly affect the accuracy of groundwater potential mapping.

Conclusions
This study evaluated the capability of two machine learning techniques, GARP and QUEST, for predicting groundwater potential for the first time, and also compared their capability and robustness with that of RF, a state-of-the-art model. The importance of groundwater conditioning factors was also evaluated. The following main conclusions were reached:

•
All three machine learning models showed good-excellent accuracy (AUC > 0.7) in prediction of groundwater potential, although there were some differences between the models in terms of predictive performance and robustness. Thus even when conducting model comparisons using clear criteria, such as prediction performance and robustness, understanding limitations and strengths remains somewhat difficult in model selection and identifying the best model. Here, RF showed the best performance based on two cutoff-dependent and cutoff-independent evaluation criteria, but exhibited slight sensitivity (lack of robustness) to changes in the training/validation data. In terms of pure prediction performance, GARP and QUEST showed slightly lower accuracy than RF. Therefore, RF provides the most accurate groundwater potential mapping, with acceptable robustness.

•
Variable importance analysis demonstrated that RSP was the most important groundwater conditioning factor, followed by lithology factor, while slope, aspect, and plan curvature were the least important factors for modeling. These findings could be of practical help for water resources managers in handling the existing uncertain situation and understanding various aspects that affect groundwater potential.

•
In future research, other metrics could be included in model comparisons, such as time spent in model generation and prediction calculation. A focus on DEM-derived factors and on obtaining high accuracy data could enhance the performance of the models. In order to raise the accuracy and reduce the uncertainties in models, hybrid modeling can be recommended, as it reduces some problems in classification models such as over-fitting.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1. Figure S1: ROC curves of the training and validation steps for the RF model using dataset S1, Figure S2: ROC curves of the training and validation steps for the RF model using dataset S2, Figure S3: ROC curves of the training and validation steps for the RF model using dataset S3, Figure S4: ROC curves of the training and validation steps for the GARP model using dataset S1, Figure S5: ROC curves of the training and validation steps for the GARP model using dataset S2, Figure S6: ROC curves of the training and validation steps for the GARP model using dataset S3, Figure S7: ROC curves of the training and validation steps for the QUEST model using dataset S1, Figure S8: ROC curves of the training and validation steps for the QUEST model using dataset S2, Figure S9: ROC curves of the training and validation steps for the QUEST model using dataset S3. Funding: This research received no external funding.