A Hybrid Computational Intelligence Approach to Groundwater Spring Potential Mapping

: This study proposes a hybrid computational intelligence model that is a combination of alternating decision tree (ADTree) classiﬁer and AdaBoost (AB) ensemble, namely “AB–ADTree”, for groundwater spring potential mapping (GSPM) at the Chilgazi watershed in the Kurdistan province, Iran. Although ADTree and its ensembles have been widely used for environmental and ecological modeling, they have rarely been applied to GSPM. To that end, a groundwater spring inventory map and thirteen conditioning factors tested by the chi-square attribute evaluation (CSAE) technique were used to generate training and testing datasets for constructing and validating the proposed model. The performance of the proposed model was evaluated using statistical-index-based measures, such as positive predictive value (PPV), negative predictive value (NPV), sensitivity, speciﬁcity accuracy, root mean square error (RMSE), and the area under the receiver operating characteristic (ROC) curve (AUROC). The proposed hybrid model was also compared with ﬁve state-of-the-art


Introduction
Groundwater serves as the source of water supply needed for different sectors, including agriculture, industry, animal husbandry, and communities in many countries around the world [1,2].Groundwater is often the result of infiltration of rainwater, snowmelt water into soil and underlying rocks, and thereupon fills the pore space of soil and rocks [3,4].Recently, based on the Bundesanstalt für Geowissenschaften und Rohstoffe [5] report, the consumption of groundwater has increased over the last few years, such that it amounts to 1000 km 3 , while the recharge of groundwater globally has reached 12,700 km 3 /year [5].Furthermore, the level of pollution and wider distribution of groundwater is low, which, in turn, has attracted more human population throughout the world [6].
In Iran, most of the people living in rural and urban areas (70%) are dependent on groundwater as a safe water resource [7].In recent years, due to climate change and intensive withdrawal of available groundwater resources, many regions of Iran have become dry and semi-dry, which has caused a serious lack of water throughout the country [1,2,[8][9][10][11][12].Since groundwater consumption in Iran has been increasing dramatically, development of proper methods to assess the aquifer productivity and groundwater potential areas are badly needed.These methods are essential for future systematic development, profitable management, and arresting the decline of groundwater resources [13].Due to the requirement of fresh groundwater increases, plans for groundwater spring potential zones become an important task to successfully determine, manage, and protect groundwater programs.Therefore, groundwater spring potential mapping (GSPM) is important for protecting water quality and managing the use of groundwater [14].Hence, GSPM is useful for proper groundwater protection and management [15].
More recently, machine learning ensemble models have been shown to be better than conventional methods in many fields, especially in natural hazards such as floods [58,59,62,[67][68][69][70][71], wildfires [72], sinkholes [73], droughts [74], earthquakes [75,76], gully erosion [77,78], land/ground subsidence [79], and landslides .However, exploration of these methods for GSPM has always been considered a big challenge.On the other hand, due to the flexibility and high prediction power of machine learning ensemble models, they are more applied in water studies, such as GSPM.Literature review shows Water 2019, 11,2013 3 of 30 that some methods have been used to identify areas with high potential of groundwater.In other words, modeling has not only continued, but it has progressed more rapidly in recent years in many fields.This illustrates that the subject of groundwater is of great importance and is being pursued to achieve high-precision maps to avoid costly traditional groundwater exploration methods and also to use groundwater aquifers in critical times, especially in drought periods.Achieving groundwater potential maps with high prediction accuracy by hybrid techniques seems to be a necessity these days.Therefore, the main objective of this study was to use a hybrid machine learning model for mapping areas with high potential of groundwater at Chilgazi watershed, northwest of Iran.In this study, the ADTree algorithm as a single/base algorithm and AdaBoost (AB) as a Mate classifier algorithm were selected for modeling groundwater.The AB algorithm is a powerful ensemble that combines sub-training datasets.Then, an ADTree was performed on each dataset, and finally, all these datasets were summed and output was achieved.This process enhanced the prediction power of ADTree and results were found more reasonable.The ensembles of ADTree algorithm are still rare in groundwater potential mapping.
Therefore, this study can be considered as a pioneering work in this area.The generated maps can be useful for decision makers, planners, managers, and government agencies for the sustainable management of ground water resources.The main objectives of this study were (i) applying an ensemble machine learning model, AB-ADTree, for groundwater spring potential mapping; (ii) selecting the most important conditioning factors for groundwater productivity; and (iii) comparing the performance of the applied model and also suggesting a promising model for groundwater exploitation instead of traditional methods, such as drilling, hydro-geological, geological, and geophysical.Additionally, we compared and validated the results obtained from the proposed model with six state-of-the-art soft computing benchmark models, including logistic regression (LR), logistic model tree (LMT), stochastic gradient descent (SGD), support vector machine (SVM), alternating decision tree (ADTree), and random forest (RF).Modeling process and susceptibility maps were done in Weka 3.6.9and ArcGIS 10.3, respectively.

Description of Research Area
The Chilgazi watershed, which is located north of Sanandaj city, Kurdistan province, Iran, lies between 46 • 45 to 46 • 57 E longitudes and 35 • 25 to 35 • 28 N latitudes.This area covers an area of around 272 km 2 .The elevation of the study area ranges from 1550 to 2859 m (Figure 1).The Ghishlagh Dam is located at the outlet of the Chilgazi watershed.The average annual temperature is 14.2 • C; the average daily minimum temperature in winter is 6.5 • C, and the average daily maximum temperature in summer is about 37 • C. The average annual precipitation is 464.2 mm, such that it mainly occurs in December to April (more than 75%).The climate of the study area based on De-Marttone climatic system is classified as semi-arid [109].Most of the area is covered by agricultural lands (23,465 ha) and rangelands (3768 ha).In addition, barren lands, pastures, residential areas, and gardens are other types of land use in the study area.The study area is geologically part of elevated Zagros (Northern Zagros) where joints, gaps, and faults have been created.Also, soil of the study area is mainly semi-deep with predominant sandy-loamy texture.Most of the study area has been covered by the Quaternary deposits, including andesite-basalt (K vc ), Sanandaj shale (K S S ), and limestone (K l u ).In addition, surface water and groundwater are the two sources of water supply, where surface water is often used for irrigation purposes and groundwater is commonly utilized for agricultural production as well as domestic purposes.

Data Collection and Interpretation
The locations of springs were determined in three steps: (1) The initial locations of springs were acquired from the Iran Water Resources Management Company (IWRMC), recorded between 2008 and 2010; (2) these locations were overlaid on the topographic map with a scale of 1:25,000 in order to control the initial location; and (3) some springs were randomly checked by field surveying for the final confirmation of their locations.Table 1 illustrates some statistical measures of springs.Basically, the discharge (lit/s) of all springs ranged between 0.2 and 10.Additionally, the average water temperature of springs (T), electrical conductivity (EC), and potential of hydrogen (pH) were 14.812 °C, 364.688, and 7.525, respectively.
In this study, the target (dependent) variable is spring locations over the study area as binary coding (spring (1) and non-spring (0) locations); however, independent variables (conditioning factors) were selected, based on the literature and data availability.Accordingly, a total of 633 springs were recorded and detected, of which 70% (444) of spring locations were randomly utilized for training and the other (30% or 190) were considered for validation of models using SPSS software.In the modeling process, using machine learning the independent variables should be binary, such as spring and non-spring occurrences.The locations of springs in the study area were easily recorded by global position system (GPS); however, the non-spring locations were recorded randomly over the study area using the "create random point" tool in Arc GIS 10.2.It is assumed that these locations are free from springs and that they do not have enough potential for spring occurrence.Some researchers have used these techniques for modeling groundwater productivity [29,110,111].Therefore, to construct the datasets, similar to the training sample size, 633 locations were randomly selected and also partitioned into 70% (training) and 30% (validation) for modeling and evaluation, respectively.All spring locations and conditioning factors were converted to pixel sizes of 20 × 20 m to construct the final dataset.

Data Collection and Interpretation
The locations of springs were determined in three steps: (1) The initial locations of springs were acquired from the Iran Water Resources Management Company (IWRMC), recorded between 2008 and 2010; (2) these locations were overlaid on the topographic map with a scale of 1:25,000 in order to control the initial location; and (3) some springs were randomly checked by field surveying for the final confirmation of their locations.Table 1 illustrates some statistical measures of springs.Basically, the discharge (lit/s) of all springs ranged between 0.2 and 10.Additionally, the average water temperature of springs (T), electrical conductivity (EC), and potential of hydrogen (pH) were 14.812 • C, 364.688, and 7.525, respectively.
In this study, the target (dependent) variable is spring locations over the study area as binary coding (spring (1) and non-spring (0) locations); however, independent variables (conditioning factors) were selected, based on the literature and data availability.Accordingly, a total of 633 springs were recorded and detected, of which 70% (444) of spring locations were randomly utilized for training and the other (30% or 190) were considered for validation of models using SPSS software.In the modeling process, using machine learning the independent variables should be binary, such as spring and non-spring occurrences.The locations of springs in the study area were easily recorded by global position system (GPS); however, the non-spring locations were recorded randomly over the study area using the "create random point" tool in Arc GIS 10.2.It is assumed that these locations are free from springs and that they do not have enough potential for spring occurrence.Some researchers have used these techniques for modeling groundwater productivity [29,110,111].Therefore, to construct the datasets, similar to the training sample size, 633 locations were randomly selected and also partitioned into 70% (training) and 30% (validation) for modeling and evaluation, respectively.All spring locations and conditioning factors were converted to pixel sizes of 20 × 20 m to construct the final dataset.

Groundwater Spring Conditioning Factors
Selecting the most relevant conditioning factors (geo-database), related to the occurrence of a spring, is a critical issue for GSPM.Hence, based on the literature review, 17 groundwater spring influencing factors were detected and classified into four groups: Topography (slope angle, curvature, slope aspect, plan curvature, elevation, profile curvature, and sediment transport index), hydrology (rainfall, sediment transport power (SPI), distance to river, topographic wetness index (TWI), and river density), geology (lithology, fault density, distance to fault, and permeability), and land cover (land use) factors (Table 2).To generate the thematic (slope angle, slope aspect, elevation, curvature, profile curvature, plan curvature, sediment transport index (STI), SPI, TWI, distance to rivers, and river density) maps, a digital elevation model (DEM) of the study area with 20 m spatial resolution was constructed from the topographic map (1:25,000 scale).Hence, for all the conditioning factors, a pixel size of 20 × 20 m was selected.
Slope aspect is a well-known conditioning factor of GSPM [12,13,22].Slope aspect can affect hydrologic response, such as solar radiation, soil-water retention, soil porosity, hydraulic conductivity, snow ablation, evapotranspiration, water cycling, and vegetation communities [115][116][117][118][119]. Generally, in the northern hemisphere, north-facing slopes are colder and wetter than south-facing slopes, which are warmer and drier [120].Therefore, the north-facing slopes have more potential for spring occurrence that indicates that groundwater is higher than at the other places.The slope aspect map of this area was derived from DEM with nine classes (Table 2), including Flat, North, Northeast, East, Southeast, South, Southeast, West, and Northwest.
Elevation is known as the height above the earth surface; it is related with climate and environment, thus affecting groundwater springs [121].It can affect the weather and climate change, and can influence soil properties and vegetation communities [39].Basically, the higher the elevation is, the more the potential of springs because of more rainfall in comparison to lower elevations.The elevation map of this study was extracted from DEM and classified into five classes, including (1) <1800; (2) 1800-1900; (3) 1900-2000; (4) 2000-2200; and (5) >2200 (Table 2).
Curvature generally has a negative relationship with groundwater recharge [13].Thus, it is considered as a conditioning factor affecting groundwater spring [13].The curvature map of the study area was generated in five categories:  2).
Plan curvature and profile curvature are the curvatures of a contour line formed by intersecting the surface with a horizontal plan and a vertical plan, respectively; thus, they affect groundwater springs [83].Plan curvature describes the divergence and convergence of flow and it can affect the concentration of flow on the ground [122].However, profile curvature can affect the pore water pressure, saturated and recharge resulting in the development of groundwater.The plan curvature map of study area was extracted from DEM and classified into five levels, such as (1) ((−7.2).
STI/low susceptibility (LS), as an important conditioning factor in the study of groundwater spring, shows the erosion power of overland streams due to two structural elements, including carrier content of alluvium flow and basin evolvement [123,124].The STI is computed from the following equation: where A s is the specific basin area (m 2 /m), and β the slope gradient [78].In this study, the STI values of study area were divided into five classes, involving (1) 0-3.83; (2) 3.83-8.66; (3) 8.66-13.3;(4) 13.3-18.8;and (5) 18.8-42.5(Table 2).
SPI has been considered as one of the conditioning factors which contributes to groundwater springs [41,42].Generally, the higher the SPI is, the higher the potential for spring occurrence because of having a higher water table.It was extracted DEM, where SPI values can be computed by the following equation [127]: where A s is defined as the specific basin area, and β is defined as the local slope gradient in degree.
TWI is an important conditioning factor for GSPM [13,128], as permeability and pore water pressure of materials are affected by water infiltration and soil strength [42].It has been extensively used to describe the effect of topography on the size and location of saturated source areas which are prone to runoff generation.Basically, areas with higher TWI indicate also the higher potential for spring occurrence.The following equation was used for the TWI computation [129]: where A s is defined as the specific basin area, and tanβ is defined as the angle of slope at that point.
River density is considered as an important conditioning factor for GSPM [114], as when the drainage density is lower, the infiltration and recharge are greater [125,131].The higher the drainage density is, the lower the infiltration and the higher the surface runoff are, which indicates that this factor has a reverse relationship with groundwater [131].The river density of the study area varied from 0 to 0.00633 (km/km 2 ) which was then divided into five categories: (1) 0-0.000744; (2) 0.000744-0.00169; (3) 0.00169-0.00248; (4) 0. 00248-0.00337;and (5) 0.00337-0.00633(Table 2).
Distance to fault is another vital factor for studying groundwater springs.This factor can affect infiltration so that the shorter the distance from the fault, the higher the potential to infiltration in comparison to farther distance from the river networks.Different types of faults can control the movement of groundwater springs on the geological structure of an area [15].Faults of the study area were extracted from the geological map at 1:100,000 scale and distance to faults map was constructed with five categories, such as (1) 0-100; (2) 100-200; (3) 200-300; (4) 300-400; and (5) >400 (Table 2).
Fault density is described as the relationship between the sum of fault lengths in the pixel and the area of the corresponding pixel [121].The areas with more faults, if they receive enough moisture and water, are also more likely to develop springs and develop aquifers than the areas with less faults.Therefore, these areas easily recharge the groundwater aquifers.The fault density of the study area was calculated from the geological map at 1:100,000 scale and was then divided into five classes: (1) 0-0.000418; (2) 0.000418-0.00114;(3) 0.00114-0.00185;(4) 0.00185-0.00267;and (5) 0.00267-0.00508km/km 2 (Table 2).
Permeability is one of the geological factors that affects the groundwater spring occurrence using discontinuity structures, such as joints, cracks, and faults.This factor was evaluated using expert knowledge and field surveys based on the lithological units.Eventually, the permeability map was classified into four categories, including very low, low, moderate, and high (Table 2).

Logistic Regression (LR)
The LR model has become a widely used and accepted model to analyze the binary outcome variables [134,135], describing both independent and dependent variables.In LR, the relationship between independent and dependent variables is nonlinear [136].Thus, it was used to describe the relationship between spring occurrence and spring-affected factors, and can be expressed as follows: where p is defined as the probability of a spring occurrence, and m infers the linear combination of a set of spring-affected factors.

Logistic Model Tree (LMT)
LMT is a comprehensive approach, which combines a decision tree and linear logistic regression technique and takes advantages of them [137].It has a high speed of learning process as a stage wise fitting process is applied to the structure of LMT [138].
Compared with a traditional decision tree, LMT employs the logistic regression functions to value the probability of each class, and applies the LogitBoost algorithm to build the logistic regression functions at the nodes of a tree.It uses the well-known CART algorithm [139] for pruning.A posterior probability for each class is determined as follows: where Hc(x) is transformed such that c c=1 Hc(x) = 0, and c is the number of classes.

Stochastic Gradient Descent (SGD)
It is necessary to introduce a simple supervised learning set-up before introducing a stochastic gradient descent approach.An arbitrary input x and a scalar output y make up an example z(x, y).In this study, x is the spring-affected factor, and y is the spring and non-spring.There is a function h(y, y ) which measures the cost of predicting y when the actual answer is y, and a function f θ (x) parameterized by a weight vector θ is chosen.Then we seek the function f which can minimize the loss D(z, θ) = h( f θ (x), y) averaged on the examples: where R( f ) measures the generalization performance, and R n ( f ) measures the training dataset performance.The SGD algorithm is a drastic simplification without the gradient of R n ( f θ ) [140].This model can directly optimize the expected risk, since the examples are randomly withdrawn from the ground truth distribution.

Support Vector Machine (SVM)
SVMs are a set of optimal separating hyper plane-based machine learning techniques [141,142].The goal of the SVM model is to minimize both model complexity and error test.In this case, our aim is to discriminate between spring and non-spring.SVMs have separate examples in different classes using the following function: where x represents the independent spring-affected factors, w represents the vector of weight, and c is a constant.

Alternating Decision Tree (ADTree)
ADTree is a generalization of decision trees, combining the boosting algorithm and decision tree [143,144].ADTree graphical rule sets form leaves of the tree.Each branch of the tree ends in an outcome and goes for another rule until it reaches the root [145].The path continues with all of the node children when reaching a prediction node.Once a set of instances reaches the leaf node, a classification is established over them.For numerical prediction purposes, the leaf nodes would be the numeric outcomes for which the values are computed, based on a weight as a contribution of that node to the final outcome.The final prediction probability is formed from the summation of all the weights contributing to the root of the tree [144].

Random Forest (RF)
Random forest (RF), which was first developed by Breiman [146], is a non-parametric model and an extension of the classification and regression trees (CART) algorithm.It produces many classification trees to enhance the prediction performance of the model.In the RF model, the splitting process of the tree at each node is done using a randomized subset of the variables.The output of the RF model is obtained by the averaging of the results of all trees [147].The RF model is constituted by numerous trees, that each tree is produced by bootstrap samples using the out-of-bag (OOB) error.The OOB is an unbiased estimate of the generalization error that has been explained and interpreted by Breiman [146].This technique (bootstrap by OOB error) has advantages, including: (1) Prevention of over-fitting during modeling by training dataset; (2) decreasing the bias and variance of the training dataset because of a large number of trees; (3) decreasing the correlation among the individual trees when the diversity of forest arises by using limit variables; (4) robust error estimates using the OOB data; and (5) achieving a higher prediction performance (Wiesmeier et al. [148]).Breiman [146] and Liaw and Wiener [149] have explained the mathematical equations of the RF in detail.In this study, the RF was used to analyze the relationship between groundwater spring locations as binary dependent variables (groundwater spring locations (1) and groundwater on-spring locations (0)), and independent variables such as slope angle, curvature, slope aspect, plan curvature, elevation, profile curvature, and sediment transport index, rainfall, sediment transport power (SPI), distance to river, topographic wetness index (TWI), and river density, lithology, fault density, distance to fault, permeability, and land use.The RF was used to obtain a probability value for each pixel of the study area to prepare groundwater spring potential mapping.

AB Learning Ensemble Techniques
As a kind of ensemble algorithm, AB constructs a composite classifier by sequentially training classifiers.The algorithm was first proposed by Freund and Schapire to improve the performance of weak classifiers [150].
This algorithm assigns a weight to each factor in the training dataset C. At the same time, each sample in the training dataset C is assigned an equal weight (1/n); therefore, in the first process, all of the samples have the same opportunity to be selected.It takes T rounds of training-based learners with T different training sample groups G t (t = 1, 2, . . .T) to generate the AB model, and this process continues until reaching a terminated condition [151].

Accuracy Assessment (Validation) and Comparison of Methods
The most important issue in introducing a novel model, and also comparing some methods with each other, is to assess the performance (classifier performance or model validation).Validation as an Water 2019, 11, 2013 12 of 30 essential process in any natural hazard phenomenon which reflects the predictive power of a model is related to the comparison of model performance with a real-word dataset [152].

Statistical Measures
Generally, there are statistical criteria for validating machine learning models [153]; however, in this study, six statistical index-based measures, including sensitivity (recall), root mean square error (RMSE) specificity, negative predictive value (NPV), accuracy, positive predictive value (PPV), and the area under the receiver operating characteristic (ROC) curve (AUROC), were used to evaluate the predictive capability of the proposed model with other benchmark models.Most of the above-mentioned criteria were computed based on the contingency table (confusion matrix), which is shown in Table 3 where TP is the number of pixels that are correctly classified as positive (springs) predictions, while the number of pixels that are correctly classified as negative (non-springs) predictions was TN.FP and FN are the pixels that were incorrectly classified as positive (springs) and negative (non-springs) predictions, respectively.More specifically, sensitivity (recall) is the number of correctly classified springs per total predicted springs, while specificity is defined as the number of incorrectly classified springs per total predicted non-springs.Accuracy (efficiency) is the proportion of spring and non-spring pixels which are correctly classified [123,126,154].PPV and NPV are the probabilities of pixels that were correctly classified as springs and non-springs, respectively.RMSE indicates the error metric between the estimated and observed values [154].A smaller RMSE indicates a better performance of the models [155].The statistical index-based measures were calculated using following equations: where n is defined as the total sample in a dataset; X predicted is the predicted value in the dataset; and X actual is the actual (output) value.

Receiver Operating Characteristics Curve (ROC)
The receiver operating characteristic curve was first suggested by Spackman [156] to evaluate the performance of empirical learning systems [105].It is another statistical tool which is a popular and highly useful graphical representation of evaluation of the model performance [157].Graphically, it is plotted on two axes (two-dimensional), including x-axis labeled with true positive rate or sensitivity (TP = TP/(TP + FN) = TP/N) and y-axis labeled with false positive rate or 100-specificty (FP = FP/(FP + TN) = FP/N) [158].In the machine learning techniques, ROC is a flexible and robust framework for evaluating the performance of classifier [159,160].The ROC is quantitatively defined using the area under the curve (AUC), which is widely used as a popular measure in the classification of performance [153].It is more applicable over other performance metrics when no threshold is fixed and applied to the scores, and is invariant to changes in cost and class distribution [161].In the optimal classifier (perfect model), the AUC has a value of 1, while for a random classifier (inaccurate model) a value of 0.5 is obtained [162].

Statistical Assessment
In order to check the statistical difference between the two groundwater spring potential models, Friedman and Wilcoxon rank tests were used in this study, where Friedman test indicates that there is no significant difference between the two models and Wilcoxon rank test indicates that statistical difference is observed between the two models.Friedman test, as a non-parametric test, is based on the null hypothesis that the performances of groundwater spring models is different at the significance level of α = 0.05.The p-value was used to evaluate this hypothesis, as if a hypothesis is likely true, then the null hypothesis is rejected, which indicates a significant difference between the two models and vice versa [58].However, the Friedman test cannot perform pairwise comparisons between the models.Therefore, the Wilcoxon sign-rank was used to evaluate the systematic pairwise differences between the models.In general, the null hypothesis is rejected if the p-value is <0.05 and the z-value is >(−1.96and +1.96) [58].

Selection of Training Factors Using Chi-Square Technique
The chi-square statistical test was employed to select training factors among attributes (conditioning factors).It is a traditional statistic to measure the relationship between two variables (factors) in the contingency table.The chi-square test compared the observed and expected frequencies of variables, so that the greater the chi-square for a variable, the higher the relationship.The results of this test were obtained using the following basic functions: where E ij is the expected value for each cell in the contingency table.We used the chi-square statistical test to specify the independence between spring and no-spring locations with other conditioning factors.If χ 2 equaled 0, it was assumed that there was no association between them and the conditioning factor would be eliminated from the model training.

Groundwater Spring Conditioning Factor Analysis
Both models and input data affect the quality of GSPM results [15].The influence of conditioning factors on groundwater spring occurrence is different, such that some of them may reduce the model accuracy.The main step in the spatial GSPM is the selection of suitable factors and the elimination of Water 2019, 11,2013 irrelevant conditioning factors to find the most reliable database.In this study, the chi-square attribute evaluation (CSAE) technique, which is one of the most efficient and popular methods [163], with 10-fold cross validation for the training dataset was used to assess the prediction capability of conditioning factors.Results of the chi-square test (Figure 3) show that the most important conditioning factors for groundwater spring potential were TWI (AM = 98.598), followed by distance from river (AM = 97.

Groundwater Spring Conditioning Factor Analysis
Both models and input data affect the quality of GSPM results [15].The influence of conditioning factors on groundwater spring occurrence is different, such that some of them may reduce the model accuracy.The main step in the spatial GSPM is the selection of suitable factors and the elimination of irrelevant conditioning factors to find the most reliable database.In this study, the chi-square attribute evaluation (CSAE) technique, which is one of the most efficient and popular methods [163], with 10-fold cross validation for the training dataset was used to assess the prediction capability of conditioning factors.Results of the chi-square test (Figure 3) show that the most important conditioning factors for groundwater spring potential were TWI (AM = 98.598), followed by distance from river (AM = 97.

Model Training and Assessment
The results of seven models, namely AB-ADTree, ADTree, SGD, LMT, SVM, and LR, constructed for groundwater spring potential prediction using the selected conditioning factors and training dataset are shown in Table 4.The training dataset was used to train the models.In the training dataset, the hybrid model (AB-ADTree) had the highest performance based on PPV, NPV, sensitivity, specificity, accuracy, kappa, AUC, and RMSE criteria.This shows that the hybrid model outperformed other individual models.AB-ADTree had the highest PPV (0.815), followed by ADTree (0.751), RF (0.749), LMT (0.746), LR (0.746), SVM (0.745), and SGD (0.724), indicating that these models in 81.5%, 75.1%, 74.6%, 74.6%, 74.5%, and 72.4% of the cases correctly classified pixels in the groundwater spring occurrence class.

Model Training and Assessment
The results of seven models, namely AB-ADTree, ADTree, SGD, LMT, SVM, and LR, constructed for groundwater spring potential prediction using the selected conditioning factors and training dataset are shown in Table 4.The training dataset was used to train the models.In the training dataset, the hybrid model (AB-ADTree) had the highest performance based on PPV, NPV, sensitivity, specificity, accuracy, kappa, AUC, and RMSE criteria.This shows that the hybrid model outperformed other individual models.AB-ADTree had the highest PPV (0.815), followed by ADTree (0.751), RF (0.749), LMT (0.746), LR (0.746), SVM (0.745), and SGD (0.724), indicating that these models in 81.5%, 75.1%, 74.6%, 74.6%, 74.5%, and 72.4% of the cases correctly classified pixels in the groundwater spring occurrence class.

Groundwater Spring Potential Mapping
After successful modeling in the training phase, the AB-ADTree, ADTree, SGD, LMT, SVM LR, and RF models were used to calculate the groundwater spring potential index for all pixels.Exported in GIS format, these indices were visualized by means of five susceptibility classes of groundwater spring potential, including very low susceptibility (VLS), low susceptibility (LS), moderate susceptibility (MS), high susceptibility (HS), and very high susceptibility (VHS).Different classification methods can be used for the classification of potential indices, such that in the current research, the quantile method was used, based on the literature review and the nature of data.
AB-ADTree, ADTree, SGD, LMT, SVM, LR, and RF were used for preparing the GSPM at the Chilgazi watershed.Two main steps for generating groundwater spring potential indices and reclassifying these indices were used for the preparation of maps.In the first step, a unique susceptibility index was assigned to each pixel of the research area and in the second step, these indices were classified into different classes using the quantile method [164], resulting in six maps by six models (Figure 4).Results show that the west part of the Chilgazi watershed showed higher potential for spring occurrence than other parts.

GSPM Validation and Comparison
The reliability of these spring potential maps was evaluated using success and prediction rates (Figure 5).For this purpose, the training dataset and validation dataset were overlaid on the GSPM and the AUC was calculated for training and validation datasets.According to Figure 5a (success rate curve), results indicate that the hybrid model, AB-ADTree, had the highest goodness-of-fit base on the training dataset (AUC = 0.846).This implies that, at the present condition of the study area, this model could appropriately distinguish the areas with high groundwater spring potential.On the other hand, most of the spring locations were located in high and very high potential areas of the map.It was followed by the RF (AUC = 0.812), LR (AUC = 0.818), LMT equal to SGD (AUC = 0.811), and SVM (AUC = 809) models.This means that the ability of the RF model to classify and detect the areas with high groundwater potential is higher than that of the LR, LMT, SGD, and SVM models.
For prediction rate or model validation that was built using the validation dataset, the highest AUC belonged to AB-ADTree (0.815), followed by LMT (0.808), RF (0.804), LR (0.803), and SGD and SVM (0.790).Therefore, the map resulting from the novel hybrid model was ranked as the most accurate and reliable model among others.Yesilnacar [165] classified the success of a model using a quantitative-qualitative relationship.Basically, if a model has an AUC between 0.9 and 1, its prediction accuracy is excellent, and for 0.8-0.9,0.7-0.8,0.6-0.7,and 0.5-0.6, the prediction accuracy is very good, good, average, and poor, respectively.Regarding this classification and Figure 5b, the findings indicate that all machine learning models had a good prediction power in groundwater potential mapping, although the ability of the AB-ADTree, LMT, RF, and LR models for groundwater was relatively higher than the ADTree, SGD, and SVM models in the study area.

GSPM Validation and Comparison
The reliability of these spring potential maps was evaluated using success and prediction rates (Figure 5).For this purpose, the training dataset and validation dataset were overlaid on the GSPM and the AUC was calculated for training and validation datasets.According to Figure 5a (success rate curve), results indicate that the hybrid model, AB-ADTree, had the highest goodness-of-fit base on the training dataset (AUC = 0.846).This implies that, at the present condition of the study area, this model could appropriately distinguish the areas with high groundwater spring potential.On the other hand, most of the spring locations were located in high and very high potential areas of the map.It was followed by the RF (AUC = 0.812), LR (AUC = 0.818), LMT equal to SGD (AUC = 0.811), and SVM (AUC = 809) models.This means that the ability of the RF model to classify and detect the areas with high groundwater potential is higher than that of the LR, LMT, SGD, and SVM models.
For prediction rate or model validation that was built using the validation dataset, the highest AUC belonged to AB-ADTree (0.815), followed by LMT (0.808), RF (0.804), LR (0.803), and SGD and SVM (0.790).Therefore, the map resulting from the novel hybrid model was ranked as the most accurate and reliable model among others.Yesilnacar [165] classified the success of a model using a quantitative-qualitative relationship.Basically, if a model has an AUC between 0.9 and 1, its prediction accuracy is excellent, and for 0.8-0.9,0.7-0.8,0.6-0.7,and 0.5-0.6, the prediction accuracy is very good, good, average, and poor, respectively.Regarding this classification and Figure 5b, the findings indicate that all machine learning models had a good prediction power in groundwater potential mapping, although the ability of the AB-ADTree, LMT, RF, and LR models for groundwater was relatively higher than the ADTree, SGD, and SVM models in the study area.

Similarities Between Prediction Power of Models
The seven models used in this study showed very good to good prediction abilities, while it remained to be determined whether there were statistically significant differences between them or not.The Freidman test was used at the significance level of 5% for this purpose (Table 6).The mean ranking of the seven models for the study area is shown in Table 6.Results reveal that because the pvalue was 0.000, i.e., less than 0.05, the null hypothesis was rejected, indicating that there were statistically significant differences between the six models.The Freidman test was not able to provide comparisons between the seven models.Therefore, the Wilcoxon sign-rank test was carried out to check the statistical significance of pairwise differences between the GSPM models.In this test, there was a pairwise comparison between the models at the 5% significant level.The p-value and z-value were used to evaluate the statistically significant differences between models.The results can be seen in Table 7.Because the p-value in all of the pairwise comparisons was less than 0.05 (0.000) and the z-value exceeded the z critical values (from −1.96 to +1.96), the null hypothesis was rejected, implying that the performances of the seven GSPM models were significantly different from each other.

Similarities Between Prediction Power of Models
The seven models used in this study showed very good to good prediction abilities, while it remained to be determined whether there were statistically significant differences between them or not.The Freidman test was used at the significance level of 5% for this purpose (Table 6).The mean ranking of the seven models for the study area is shown in Table 6.Results reveal that because the p-value was 0.000, i.e., less than 0.05, the null hypothesis was rejected, indicating that there were statistically significant differences between the six models.The Freidman test was not able to provide comparisons between the seven models.Therefore, the Wilcoxon sign-rank test was carried out to check the statistical significance of pairwise differences between the GSPM models.In this test, there was a pairwise comparison between the models at the 5% significant level.The p-value and z-value were used to evaluate the statistically significant differences between models.The results can be seen in Table 7.Because the p-value in all of the pairwise comparisons was less than 0.05 (0.000) and the z-value exceeded the z critical values (from −1.96 to +1.96), the null hypothesis was rejected, implying that the performances of the seven GSPM models were significantly different from each other.

Discussion
Recognizing the areas that have enough potential for groundwater exploration based on the spring density can be considered as one of the significant areas for water resources management, especially in semi-arid watersheds such as Chilgazi in the case study.Therefore, machine learning and ensemble techniques can be used as alternative and effective tools for preparing GSPM due to their ability and flexibility.This study applied and extended a hybrid machine learning algorithm, AB-ADTree, for this purpose, and the results were compared and validated based on statistical metrics and also some soft computing benchmark models.The factor selection using the chi-square attribute evaluation (CSAE) technique concluded that among 17 conditioning factors, only 13 factors were more significant and were considered for modeling-in which TWI was the most important factor.TWI indicates topographic wetness of the ground surface, and the higher the TWI is, the higher the probability of the water table to be closer to the ground surface.Springs occur in regions where the water table reaches ground surface.Naghibi and Dashtpagerdi [166] reported that according to the generalized cross validation technique TWI, slope angle and fault density were more important factors for GSPM in their study area.Additionally, some conditioning factors, including aspect, profile curvature, permeability, fault density, and distance to fault, were removed from the modeling in the training phase due to having zero chi-square values.There are two reasons for that: (i) The removed conditioning factors were maybe not contributing to explaining the spatial distribution of springs in the study area; (ii) it is probable that the cartography or method used for extracting the removed conditioning factors did not properly reflect them.
Results of modeling depict that the proposed ensemble model and benchmark machine learning models had satisfactory performances for groundwater spring potential mapping.The area under the ROCs illustrates that all models had an AUC from 0.790 to 0.815, indicating that although all models had high performance and prediction accuracy, the proposed model, AB-ADTree, outperformed and outclassed the other benchmarks models (ADTree, SGD, LR, LMT, SVM, and RF).In this line, this model had acceptable results in the other fields of the environment, such as groundwater well potential mapping [166], ecological modeling [167], landslide susceptibility modeling [168], and flood susceptibility modeling [169,170].The findings pinpoint that the AB-ADTree ensemble model had a better fit to the training dataset during the modeling process, and then it had high prediction accuracy.In other words, adaptive boosting, known as AdaBoost, randomly divided the training dataset into some sub-training datasets.Then, on each dataset, the ADTree was employed and, finally, an output was obtained based on a weighted sum of all ADTree base performed models [171].This process improved the goodness-of-fit and prediction accuracy of the ADTree by decreasing the over-fitting, and also errors, in training dataset [94,172,173].
Among benchmark machine learning models, the RF model had the highest prediction accuracy (AUC = 0.809), followed by the LMT, LR, SGD, SVM, and ADTree models.The LMT is one of the decision tree classifiers that is a combination of linear logistic regression model and a decision tree classifier that, in this study, had higher performance.On the other hand, the LR, SVM, and SGD models are based on the equation function to obtain the weight for each conditioning factor to spatially predict the groundwater potential.The results were obtained for the current study area, although the results may be different in other regions.This conflict is reflected by the uncertainties in the modeling process due to data and model selection.In other words, data (conditioning factors) are different from one region to another and it makes a different result during modeling.Additionally, the result of a model is totally different with another model in a given region with similar conditioning factors.Hence, each model should be tested and evaluated based on its conditions, and the best model with the highest predictive power should be selected.Meanwhile, the main goal was to reach a high-precision groundwater potential map that will allow to identify areas with high groundwater aquifers in the future to use in a critical condition, such as drought.Therefore, we tested the proposed model and other models for GSPM and it was confirmed to use in other environmental regions with similar conditions with more caution and requirements.

Conclusions
Springs, as groundwater resources, are important for many sectors, such as domestic consumption and agriculture in arid and semi-arid areas of the world.Sometimes, several families or villagers depend heavily on a spring; therefore, their spatial modeling is necessary.Different approaches can be taken for this type of modeling.We designed a hybrid machine learning approach called AB-ADTree to deal with this issue.Hydrological factors, including TWI, distance to river, and SPI, among others, such as topographic, geological, and land cover factors, were first evaluated as the most affecting conditioning factors for GSPM based on the chi-square attribute evaluation (CSAE) factor selection technique.This indicates that these factors can be applied to explore groundwater in the study area and similar areas in semi-arid regions.
To model GSPM, we selected the ADTree algorithm, and its ensemble was then applied based on the AB algorithm.This resulted in designing a hybrid machine learning model, AB-ADTree, to spatially predict groundwater spring locations in the Chilgazi watershed, Kurdistan province, Iran.The efficiency of this approach was verified by applying several soft computing benchmark algorithms, such as SGD, LMT, LR, SVM, and RF.The hybrid model was successfully trained and evaluated such that it acquired the highest rank of testing criteria, including PPV, NPV, sensitivity, specificity, accuracy, RMSE, and AUC, in both training and validation datasets.GSPM maps were generated by all of the applied models and evaluated by AUROC.The hybrid generated model had the highest prediction accuracy in comparison to other models.The Friedman and Wilcoxon rank statistical tests were used for further confirmation of the results.The findings indicate that the hybrid model, AB-ADTree, can be considered as a promising technique for the mapping of groundwater potential that has been overcome based on the study area conditions, and it is recommended that for other regions, it should be further tested and evaluated.Moreover, it can be useful for decision makers, planners, managers, and government agencies for the sustainable management of ground water resources.

Figure 1 .
Figure 1.Location of the research area and groundwater springs.

Figure 1 .
Figure 1.Location of the research area and groundwater springs.

Water 2019 , 30 "
11, 2013   "AB-ADTree" Model In this study, we combined a decision tree classifier, Alternating Decision Tree (ADTree), with a Meta/ensemble classifier, AB-named "AB-ADTree"-in order to spatially predict springs.The framework of the proposed ensemble model is shown in Figure2.Basically, the GSPM using the proposed model was performed in five steps: (1) Data collection and interpretation; (2) selecting the most conditioning factors using the chi-square technique in modeling; (3) training the AB-ADTree ensemble model; (4) validating and comparing spring models; and (5) preparing groundwater potential maps.Water 2019, 11, x FOR PEER REVIEW 11 of AB-ADTree" ModelIn this study, we combined a decision tree classifier, Alternating Decision Tree (ADTree), with a Meta/ensemble classifier, AB-named "AB-ADTree"-in order to spatially predict springs.The framework of the proposed ensemble model is shown in Figure2.Basically, the GSPM using the proposed model was performed in five steps: (1) Data collection and interpretation; (2) selecting the most conditioning factors using the chi-square technique in modeling; (3) training the AB-ADTree ensemble model; (4) validating and comparing spring models; and (5) preparing groundwater potential maps.

Figure 2 .
Figure 2. The flowchart of the methodology for groundwater spring potential mapping.Abbreviations: ROC, receiver operating characteristic.

Figure 2 .
Figure 2. The flowchart of the methodology for groundwater spring potential mapping.Abbreviations: ROC, receiver operating characteristic.

Figure 3 .
Figure 3.Most effective conditioning factors for groundwater spring potential mapping.

Figure 3 .
Figure 3.Most effective conditioning factors for groundwater spring potential mapping.

Figure 5 .
Figure 5. Area under the ROC curve (AUROC) of the seven models for groundwater potential mapping (GSPM) using training (a) and validation (b) datasets.

Figure 5 .
Figure 5. Area under the ROC curve (AUROC) of the seven models for groundwater potential mapping (GSPM) using training (a) and validation (b) datasets.

Table 1 .
Statistical measure of springs in the study area.

Table 2 .
Groundwater spring conditioning factors and their classifications for modeling groundwater spring potential mapping (GSPM) at Chilgazi watershed.Abbreviations: SPI, sediment transport power; TWI, topographic wetness index; LS, low susceptibility; STI, sediment transport index.

Table 3 .
Confusion matrix.Abbreviations: TP, number of pixels correctly classified as positive (springs) predictions; TN, number of pixels correctly classified as negative (non-springs) predictions; FP, number of pixels incorrectly classified as positive (springs) predictions; FN, number of pixels incorrectly classified as negative (non-springs) predictions.

Table 6 .
Average ranking of the seven groundwater spring potential models (GSPM) using the Friedman test.

Table 6 .
Average ranking of the seven groundwater spring potential models (GSPM) using the Friedman test.

Table 7 .
Performance of the seven groundwater spring potential models (GSPM) using Wilcoxon sign-rank test (two-tailed).
(The standard p-value is 0.05) NPD: Number of positive differences, NND: Number of negative differences.