Modeling of Groundwater Potential Using Cloud Computing Platform: A Case Study from Nineveh Plain, Northern Iraq

: Knowledge of the groundwater potential, especially in an arid region, can play a major role in planning the sustainable management of groundwater resources. In this study, nine machine learning (ML) algorithms—namely, Artiﬁcial Neural Network (ANN), Decision Jungle (DJ), Averaged Perceptron (AP), Bayes Point Machine (BPM), Decision Forest (DF), Locally-Deep Support Vector Machine (LD-SVM), Boosted Decision Tree (BDT), Logistic Regression (LG), and Support Vector Machine (SVM)—were run on the Microsoft Azure cloud computing platform to model the groundwater potential. We investigated the relationship between 512 operating boreholes with a speciﬁed speciﬁc capacity and 14 groundwater-inﬂuencing occurrence factors. The unconﬁned aquifer in the Nineveh plain, Mosul Governorate, northern Iraq, was used as a case study. The groundwater-inﬂuencing factors used included elevation, slope, curvature, topographic wetness index, stream power index, soil, land use/land cover (LULC), geology, drainage density, aquifer saturated thickness, aquifer hydraulic conductivity, aquifer speciﬁc yield, depth to groundwater, distance to faults, and fault density. Analysis of the contribution of these factors in groundwater potential using information gain ratio indicated that aquifer saturated thickness, rainfall, hydraulic conductivity, depth to groundwater, speciﬁc yield, and elevation were the most important factors (average merit > 0.1), followed by geology, fault density, drainage density, soil, LULC, and distance to faults (average merit < 0.1). The average merits for the remaining factors were zero, and thus, these factors were removed from the analysis. When the selected ML classiﬁers were used to estimate groundwater potential in the Azure cloud computing environment, the DJ and BDT models performed the best in terms of all statistical error measures used (accuracy, precision, recall, F-score, and area under the receiver operating characteristics curve), followed by DF and LD-SVM. The probability of groundwater potential from these algorithms was mapped and visualized into ﬁve groundwater potential zones: very low, low, moderate, high, and very high, which correspond to the northern (very low to low), southern (moderate), and middle (high to very high) portions of the study area. Using a cloud computing service provides an improved platform for quickly and cheaply running and testing different algorithms for predicting groundwater potential.


Introduction
Groundwater is a vital resource for supplying drinking water for millions of people around the world, as well as for agriculture, industry, and preserving the natural environment. Relative to surface water, groundwater has a range of important benefits: it is less susceptible to seasonal and perennial variability, often more evenly dispersed over large areas, more insulated against pollution, and of acceptable quality for most common uses, such as drinking and irrigation [1]. Groundwater also acts as a key strategic reservoir during droughts [2]. Notwithstanding these benefits, groundwater is often under-monitored and under-controlled in comparison with more visible surface water. As a result, it is difficult to get a good picture of groundwater's temporal and spatial availability, which is the most important requirement for groundwater management. Hydrogeological mapping is a tool for comprehensive groundwater resource development and planning. Hydrogeological maps are necessary for collecting and analyzing aquifer and related geological data in order to establish a three-dimensional representation of the subsurface environment where groundwater exists [3]. Furthermore, mapping aids in determining the exposure of aquifers and their related ecosystems to pollution and overexploitation, as well as identifying areas for artificial recharge and communicating information to groundwater users [4]. Traditional groundwater mapping requires a lot of effort and money, especially in remote areas or developing countries. This necessitates the development of new methods to make groundwater exploration and assessment as effective as possible. One of these methods is the assessment of groundwater potential by relying on geographic information systems (GIS), which has also been associated with the use of remote sensing techniques and global positioning system technology, machine learning (ML), and soft computing techniques [5][6][7]. The term groundwater potential denotes the amount of groundwater available in an area, and it is a function of several hydrologic and hydrogeological factors [8]. This definition, from our point of view, is too simple, as the groundwater potential in an area is the outcome of a complex process that is not influenced by hydrological and hydrogeological elements. Many other factors such as geology and structural setting, geomorphology, and topography can influence groundwater accumulation. The most accurate definition should be as follows: groundwater potential denotes the amount of groundwater available in an area, and it is a function of several surface and subsurface factors.
Martínez-Santos and Renard [9] provide a more practical definition for this term as "an application of predictive mapping, a forecasting technique that consists in developing spatially distributed estimates for a target variable based on a series of indirect indicators (explanatory variables)". The target outcome of groundwater potential maps is the feasibility of drilling successful boreholes in different parts of a given region. In general, a closer look at the literature shows two different meanings for groundwater potential. Some researchers have used groundwater potential analysis as an exploration tool by assuming that the study of surficial factors such as topography, geology, geomorphology, soil, land use/land cover (LULC), drainage characteristics, lineament density, and proximity to surface-water bodies offer an indirect exploration tool to find where the groundwater is more likely to occur [10][11][12]. Others have contended that groundwater potential maps show variation in groundwater storage across a given region and thus provide information on groundwater availability and productivity [5,13,14].
Worldwide, researchers have used three types of techniques to model groundwater potential: data-knowledge, data-driven, and hybrid models of knowledge and data-driven techniques [15]. In data-knowledge models, such as simple overlay technique, fuzzy logic, and multi-criteria decision making (MCDM), a specific number of groundwater-affecting occurrence factors are combined to generate the groundwater potential map [16][17][18][19]. In data-driven models, such as bivariate and multivariate statistical models, ML classifiers, and hybrids of these three models, or with knowledge-driven models, the relationship between the locations of wells with specified pumping capacity or specific capacity (Sc) and the groundwater influential occurrence factors is explored to map groundwater potential [5,13,[20][21][22][23][24]. The locations of the operating groundwater wells, in this case, are taken as the target (dependent) variable and the groundwater influential occurrence factors are taken as predictors (independent variable).
Cloud computing services enable the development of virtual servers, referred to as instances, inside a centralized computing facility operated by service providers such as Amazon, Microsoft, or Google, among others [25]. Cloud computing encompasses all the technologies provided as services over the Internet, as well as the hardware and software in the data centers that support those services [26]. The key benefit of cloud computing is that it allows users to access huge computational resources without having to invest in conventional servers or distributed computing infrastructure. The major benefit of the cloud is that any researcher with an internet connection, a username, a password, and a system may access the final output from any location in the world [27]. Anyone may perform the reengineering and test the results if necessary. The final assessed result can be made available to an application that can use it as a web service. Hayley [25] discussed how cloud computing can be effectively used for numerical modeling of groundwater flow and automatic calibration of groundwater simulation models that need vast resources to implement [28,29]. In this study, we use cloud computing in GIS-based groundwater potential mapping and examine the opportunity given by this remote, highly efficient platform instead of using stand-alone applications. Wang et al. [30] presented the only available study concerning the use of cloud computing to study groundwater potential mapping, employing a simple overlay technique supported by the analytical hierarchy process (AHP) method and employing only a few surface thematic layers such as topographic slope, aspect, water-density, land surface temperature, and normalized difference vegetation index (NDVI). The authors of the previous article used the groundwater potential maps as an exploration tool to indicate where the groundwater was likely to occur.
In this study, the Microsoft Azure cloud computing platform was used for modeling groundwater productivity in the Nineveh Plain, northern Iraq. Nine ML algorithmsnamely, Artificial Neural Network (ANN), Decision Jungle (DJ), Averaged Perceptron (AP), Bayes Point Machine (BPM), Decision Forest (DF), Locally-Deep Support Vector Machine (LD-SVM), Boosted Decision Tree (BDT), Logistic Regression (LG), and Support Vector Machine (SVM)-were used for modeling the relationship between the locations of 520 boreholes with specific capacity data and 14 groundwater-influencing occurrence factors. The primary focus was to demarcate groundwater potential areas-e.g., areas with a sufficient yield for groundwater abstraction for a given region. The novelty of this work is that it is the first-time cloud computing has been used to reveal groundwater potential (storage variation) using both surface and subsurface groundwater influential occurrence factors, whereas in most groundwater potential studies, the subsurface factors (aquifer related factors) have been ignored.

The Study Area
The study area (Nineveh Plain) is located in northeastern Mosul governorate, northern Iraq, between 36 • 47 27.47 N-35 • 59 3.57 N latitude and 42 • 44 33.51 E-43 • 33 33.91 E longitude. It covers an area of 2540 km 2 . It extends from Mosul Lake in the north to the Great Zab River in the south and is bordered by the Tigris River on the west (Figure 1). The elevation ranges from 1047 m above mean sea level (amsl) in the northeastern part of the study area, within the Alqoosh and Ain Sifni anticlines, to 190 m amsl in the Tigris River basin to the southwest. In general, the land surface is relatively flat in the middle and south of the area, with some hills and valleys in the north and northeast (Figure 1). The ages of exposed formations in the study area range from Middle-Upper Eocene to Quaternary. The youngest formations are exposed in the middle and south of the study area, while the oldest formations appear within the folded rocks in the north and northeast of the study (Figure 2). Table 1 gives a brief overview of the exposed formations in the study area (see [30] for more information). The study area contains many complex structures in the northern and northeastern parts (Figure 2), including several major faults near the folds. The axes of the folds represent hydrologic divides. Geological formations that are exposed on the flanks of these folds represent recharge areas the Pila Spi Formation, which is jointed and fractured in the folded area. Six LULC classes are recognized in the study area: Urban, Agriculture, Grassland/Pasture, Rocky, Forest, and Water Bodies ( Figure 3). Agriculture covers about 1769 km 2 (69.4%), while the other classes cover 771 km 2 (30.6%). The soil textures in the study area were mapped using 36 soil samples using the USDA web-based soil texture calculator (https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey) in 2 April 2020. Based on the assigned texture, the soils were classified into hydrological groups and mapped using the Thiessen polygon method (Figure 4).
River basin to the southwest. In general, the land surface is relatively flat in the middle and south of the area, with some hills and valleys in the north and northeast (Figure 1). The ages of exposed formations in the study area range from Middle-Upper Eocene to Quaternary. The youngest formations are exposed in the middle and south of the study area, while the oldest formations appear within the folded rocks in the north and northeast of the study (Figure 2). Table 1 gives a brief overview of the exposed formations in the study area (see [30] for more information). The study area contains many complex structures in the northern and northeastern parts (Figure 2), including several major faults near the folds. The axes of the folds represent hydrologic divides. Geological formations that are exposed on the flanks of these folds represent recharge areas the Pila Spi Formation, which is jointed and fractured in the folded area. Six LULC classes are recognized in the study area: Urban, Agriculture, Grassland/Pasture, Rocky, Forest, and Water Bodies (Figure 3). Agriculture covers about 1769 km 2 (69.4%), while the other classes cover 771 km 2 (30.6%). The soil textures in the study area were mapped using 36 soil samples using the USDA web-based soil texture calculator (https://www. nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey) in April 2 2020. Based on the assigned texture, the soils were classified into hydrological groups and mapped using the Thiessen polygon method (Figure 4).           The construction of the conceptual model for study area was based on the records data for boreholes obtained from the Groundwater Department in the city of Mosul. These data represent characteristics of a well, including construction data, well owner, well depth and geographic coordinates, lithology and stratigraphy data including intervals, material types, and well screens position. Additionally, information such as depth to groundwater, drawdown, pumping test data, and well discharge were also available. Thirty-four wells were selected to represent the study area. These wells had a semi-completed record, and there was an even distribution of these wells through the area. The main water-bearing layers in the study area are located in the Quaternary and Tertiary formations (Mukdadiyah, Bai Hassan, and Injana Formations). Due to the lack of a regional confining unit in the study area and the existence of numerous joints, faults, and fractures that allow hydraulic connections between these formations, these water-bearing layers can be considered as a single unconfined aquifer system, Figure 5. The groundwater depth ranges from 11 to 66 m below land surface (bls) and averages 28.81 m bls. Figure 6a shows the spatial distribution of the groundwater depths in the study area based on the measurements made by the authors in 58 wells evenly distributed throughout the study area. The depth to groundwater increases from southeast to northeast. Overall, the shallow depths occur in the plain, and the greater depths appear in the folded areas, especially where the oldest formations are exposed. Groundwater levels (heads) were estimated by subtracting the ground elevation from the depth to groundwater available at 58 wells, and the result was interpolated by Empirical Bayesian Kriging technique to reveal the spatial distribution of groundwater level over the study area, Figure 7. The groundwater levels are high in the north and northeastern parts of the study area and decrease towards the Basin of Tigris River. The north and northeast areas of the study area represent the recharge zone while south and southwest areas represent discharge areas.    The spatial distribution of the saturated thickness of this aquifer is shown in Figure 6b. There is a clear decrease in the saturated thickness of the aquifer from the north and northeast to the west and southwest. The structural setting (the distribution of folds and faults) plays a major role in controlling the aquifer saturated thickness, in addition to the proximity to the recharge area.

Tigris River
The pumping test data of 14 wells in the study area was used to infer the hydraulic characteristics of the main aquifer, i.e., the hydraulic conductivity and specific yield. The estimated hydraulic conductivity was interpolated using inverse distance weighting to reveal its spatial distribution (Figure 6c). The hydraulic conductivity is low in the north and increases towards the Tigris River in the south and west, where there are high-permeability Quaternary and Tertiary sediments that mainly consist of pebbly sands. The transmissivity is higher in the middle of the study area and the south than in the northern and eastern parts (Figure 6d), which are mainly occupied by lower-permeability fractured rocks. The specific yield of the aquifer ranges from 0.001 to 0.01 (average 0.004), with the relatively high values appearing in the northeast part of the study area and in small spots in the middle and southeast (Figure 6e).

Workflow Steps
The steps for conducting this work are summarized in Figure 8 and consisted of four stages. The inventory map of productive wells was prepared first from well capacity data. This map was used as a target variable in the analysis of groundwater potential. Second, 14 surface and subsurface factors influencing groundwater productivity-namely, hydraulic conductivity, saturated thickness, specific yield, depth to groundwater, rainfall, geology, soil, LULC, topography-related factors (slope, topographic wetness index (TWI), stream power index (SPI)), drainage density, fault density, and distance to faults-were collected from different sources such as field surveys, remote sensing data, and previous works. These factors were used as predictors in the groundwater potential analysis. Third, the relationship between the well inventory map and the influential groundwater occurrence factors was explored using nine ML algorithms through the Microsoft Azure cloud computing platform. Fourth, the performance of the used algorithms was compared using error statistics measures, and the best was selected to map the probability of groundwater potential.

Operating Wells Inventory Map
Water wells in the study area are mainly used for drinking, industrial, and agricultural (including livestock) purposes. The inventory map of the well locations was prepared using the data of the General Commission of Groundwater (GCG), Mosul branch, and an extensive field survey by the authors. We registered and mapped locations of 520 wells to use as the target variable to generate the map of groundwater potential. Values of Sc, which is the pumping rate divided by the drawdown [32], were considered. The Sc can be efficiently used to design the maximum yield of a well and to estimate the transmissivity of the aquifer tapped by the well. The pumping capacity of the well is not a good indicator for determining the productivity of the water-bearing layer because pumps may vary in capacity from one well to another. Therefore, the areas of low productivity may seem to be highly productive if the pumping capacity of the well (flow rate) is high and vice versa [33]. Values of Sc in the study area ranged from 0.006 to 9.994 L/s/m, with an average of 1.000 L/s/m. The Sc of the 520 wells was divided into high-potential (>2 L/s/m) and low potential (≤2 L/s/m) [5]. Wells with high and low productivity status were coded as yes and no, respectively, to use in the classification problem solved in this study. After assigning the codes for the wells, these wells were randomly partitioned into two groups: 70% (364 wells) were used for training the models, and the remaining 30% (156 wells) were allocated for testing ( Figure 9).

Factors Affecting Groundwater Occurrence and Availability
The numbers and types of groundwater-influencing occurrence factors used in the analysis of groundwater potential vary depending on the complexity of the problem to be solved, data availability, and the method in which the groundwater potential map is used, i.e., as an exploration tool or to refer to the groundwater productivity and availability. These factors can be categorized into two main groups: surface and subsurface. The surface factors such as the topography-related factors, exposed lithological units, soil, LULC, and drainage density mainly control the groundwater renewal rate, while subsurface factors determine the aquifer capacity (the ability of the aquifer to store and transmit groundwater) and the capability of an aquifer to respond to external stresses (natural or induced). The subsurface factors, which include the transmissivity, storativity, aquifer type, aquifer architecture, groundwater depth, and presence of structural features such as faults, determine the strategic or fixed groundwater storage. In most cases, especially in arid and semi-arid regions, due to the precipitation scarcity, uneven distribution of precipitation, and absence of perennial lakes and rivers that have direct contact with the aquifer system, the percentage of renewable storage is much lower than the fixed groundwater storage in stock. Therefore, many of the large aquifers in arid and semi-arid regions are depleted and enter a stage of mining due to the imbalance between aquifer inputs and outputs [34]. Mapping of groundwater availability or productivity using only surface factors pertains only to the renewable storage of the aquifer. To map the spatial distribution of groundwater productivity across the study area, we need to consider both the surface and subsurface factors.

Factors Affecting Groundwater Occurrence and Availability
The numbers and types of groundwater-influencing occurrence factors used in the analysis of groundwater potential vary depending on the complexity of the problem to be solved, data availability, and the method in which the groundwater potential map is used, i.e., as an exploration tool or to refer to the groundwater productivity and availability. These factors can be categorized into two main groups: surface and subsurface. The surface factors such as the topography-related factors, exposed lithological units, soil, LULC, and drainage density mainly control the groundwater renewal rate, while subsurface factors determine the aquifer capacity (the ability of the aquifer to store and transmit groundwater) and the capability of an aquifer to respond to external stresses (natural or induced). The subsurface factors, which include the transmissivity, storativity, aquifer type, aquifer architecture, groundwater depth, and presence of structural features such as faults, determine the strategic or fixed groundwater storage. In most cases, especially in arid and semiarid regions, due to the precipitation scarcity, uneven distribution of precipitation, and absence of perennial lakes and rivers that have direct contact with the aquifer system, the percentage of renewable storage is much lower than the fixed groundwater storage in stock. Therefore, many of the large aquifers in arid and semi-arid regions are depleted and enter a stage of mining due to the imbalance between aquifer inputs and outputs [34]. Mapping of groundwater availability or productivity using only surface factors pertains only to the renewable storage of the aquifer. To map the spatial distribution of groundwater productivity across the study area, we need to consider both the surface and subsurface factors.
The elevation, soil, LULC, geology, depth to groundwater, aquifer saturated thickness, hydraulic conductivity, and specific yield were generated using different sources such as the 30 m Shuttle Radar Topography Mission digital elevation model (DEM), archival data, and field test data (Figures 1-4 and 6a-e), Section 2.1. To generate the slope, TWI, and SPI, the same DEM that was used to generate the elevation layer was used after pre-processing (fill sink) and re-projection. The importance of these factors for modeling The elevation, soil, LULC, geology, depth to groundwater, aquifer saturated thickness, hydraulic conductivity, and specific yield were generated using different sources such as the 30 m Shuttle Radar Topography Mission digital elevation model (DEM), archival data, and field test data (Figures 1-4 and Figure 6a-e), Section 2.1. To generate the slope, TWI, and SPI, the same DEM that was used to generate the elevation layer was used after pre-processing (fill sink) and re-projection. The importance of these factors for modeling groundwater potential is comprehensively discussed elsewhere [35][36][37]. The slope thematic layer was directly generated from the DEM (Figure 5f). Both TWI and SPI are used to describe the spatial soil moisture pattern. TWI was developed by [38] and is defined as where α is the local upslope area draining through a certain point per unit contour length, and tan β is the local slope in degrees. TWI reflects the tendency of water to accumulate at any point in a catchment (in terms of α) and the tendency of gravitational forces to move that water downslope (in terms of tan β). The SPI measures the erosive power of the stream and is defined as where A s is a specific catchment area. The TWI and SPI maps are shown in Figure 6g,h. Drainage density (DD) is the total length of all channels (L) in the drainage basin divided by the area of the basin (A): DD represents both the drainage basin's ability to produce surface runoff and the erodibility of the surface material by indicating the extent of terrain dissection by channels. Areas with high DD are generally characterized by low infiltration rate, rapid runoff generation, and moderately erodible surface materials. As a result, when the DD is high, the surface run-off is high, and the quantities of water infiltrated through the soil section, which contribute to groundwater recharge and storage, are reduced. When the DD is low, however, it is expected that water infiltration will be larger due to the availability of adequate circumstances such as topographic characteristics, soil, and land uses, and that this process will affect groundwater recharge. The DD in the middle and northern portions of the study area is relatively dense, indicating that these parts receive less recharge than other parts (Figure 6i).
The structural setting, represented by distance to faults and fault density, plays a role in controlling the fixed groundwater storage by controlling the architecture of the aquifer itself. In addition, faults allow water to enter the subsurface and enhance groundwater storage [40]. The fault features were firstly digitized from the tectonic map of Iraq with a scale of 1:1,000,000. The Euclidian distance command of ArcGIS 10.4 was used to create the distance to faults map (Figure 6j), and the line density command was used to create the fault density map (Figure 6k).
Rainfall is an important factor in delineating groundwater potential because it mainly affects the renewable recharge of the aquifer system. The average annual sum of rainfall recorded at eight stations inside and outside of the study area was used to map the interpolated surface of this factor (Figure 6l) [41]. Overall, the rainfall in the northern part of the study area is higher than in the south, indicating that recharge is higher in the north, taking into consideration the exposed lithology and type of soil.

Feature Selection
Feature selection is the process of selecting a subset of characteristics from a dataset based on their quality, importance, conventions, significance, and constraints [42]. Feature selection helps in (1) simplifying the predictive models so that they can be easily developed and interpreted; (2) shortening the time of model training, and (3) enhancing generalization by reducing model over-fitting. In this study, the information gain ratio was used to quantify which variables were important in the analysis of groundwater potentiality and which should have been discarded to obtain more accurate ML models. Information gain chooses features by assessing each variable's gain in the context of the target variable. The calculation is called mutual information between the two random variables. By dividing a dataset according to a specified value of a random variable, information gain evaluates the reduction in entropy or surprise. A higher information gain implies a lower entropy group or groups of samples, and hence a lower level of surprise [43]. More information about the mathematics of this technique can be found in [44].

Azure Cloud Platform and Machine Learning Classifiers Used
Azure is a cloud computing service for developing, testing, deploying, and managing applications and services through Microsoft-managed data centers [45]. It offers software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS), and it supports a wide range of programming languages, tools, and frameworks, including Microsoft-developed and third-party software.
For this research, nine ML classifiers were used to map the probability of groundwater potential in the study area. The first of these, ANN, is a computer system that mimics the way the human brain analyzes and processes data [46]. An ANN is made up of hundreds or thousands of artificial neurons called processing units that are connected via nodes. Based on an internal weighting mechanism, the input units receive diverse forms and structures of information, and the neural network strives to learn about the information provided to generate one output report [47]. Backpropagation is a set of learning principles used by ANNs to perfect their output results. Many hidden layers can be inserted between the input data and output data layers. Just one or a few hidden layers can perform most predictive tasks quickly.
AP is a special kind of neural network perceptron, which is a classification method that finds a separating hyperplane (a line that linearly divides a dataset) to make predictions. The perceptron is an online algorithm, which means it processes each instance in the training set individually [48]. Using a set of starting weights, the weighted sum of the features is calculated for each sample in the training set. The weights stay unchanged if this value has the same sign as the current example's label. The weights are updated in a generalization of this method by multiplying the feature vector by the learning rate and by the gradient of some loss function [49]. In the AP, a weight vector is produced for each iteration or pass over the training data. After that, the final prediction is determined by averaging the weighted sum of each weight vector and examining the result's sign.
SVM is an advanced supervised machine learning approach to classification and regression problems. It aims to determine the optimum hyperplane for dividing a dataset into distinct classes [50]. The data points near the hyperplane, called support vectors, are important parts of the SVM since deleting them changes the location of the hyperplane substantially. The SVM method selects a hyperplane with the largest possible margin between it and any point in the training dataset, allowing the new data to be correctly categorized. When defining a clear hyperplane is problematic, the two-dimensional data are transformed into three dimensions using kernelling, and the hyperplane becomes a plane [51]. The dataset is continuously mapped into higher and higher dimensions until an optimum segregation hyperplane can be constructed.
LD-SVM is a kind of SVM in which the kernel function for mapping data points to feature space is specially intended to minimize training time while keeping maximal classification accuracy. Because this approach is based on supervised learning, it requires a tagged dataset with a label column.
The BPM is a Bayesian method for linear classification. By selecting one "average" classifier, the Bayes Point efficiently approximates the theoretically optimal Bayesian average of linear classifiers (in terms of generalization performance) [52]. The BPM is not prone to overfitting the training data because it is a Bayesian classification machine.
DF is a fast supervised ensemble model, and it is a good choice for predicting a target with a maximum of two outcomes. The DF algorithm is a classification-focused ensemble learning approach. Ensemble techniques are founded on the basic idea that by developing several related models and integrating them in some way, one may achieve better results and a more generalized model than by depending on a single model. In general, ensemble models outperform single decision trees in terms of coverage and accuracy.
DJ is a recent extension to DF. A DJ consists of an ensemble of decision-directed acyclic graphs (DAGs). A DAG enables numerous pathways from the root to each leaf, unlike traditional decision trees that only allow one path to each node. Node splitting and merging are driven by the minimization of the same objective function during training, in this case, the weighted sum of entropies at the leaves. Results from a variety of datasets demonstrate that DJs need significantly less memory than DFs and various other baselines, while significantly enhancing generalization [53].
A BDT is an ensemble learning approach in which the first tree's errors are corrected by the second tree, the second tree's errors are corrected by the third tree, and so on. The whole ensemble of trees that produces the prediction is used to create the prediction. BDTs are, in general, the simplest approaches for achieving top performance on a wide range of machine learning problems when correctly set. They are, nevertheless, memory-intensive learners. As a result, a BDT model may not be able to handle the huge datasets that linear learners can.
Finally, LG is a well-known statistical approach for predicting the probability of a given event, and it is particularly useful for classification problems. The algorithm predicts the probability of occurrence of an event by fitting data to a logistic function.

Error Statistics Used to Evaluate the Model Performances
Five error measures were used in this study to evaluate the performances of the ML models: accuracy (ACC), precision (P), recall (R), F-score (F), and area under the receiver operating characteristic curve (AUC). ACC is the proportion of cases that are predicted correctly by the model [54]. It is computed in terms of confusion matrix as where, TP is the number of cases that predicted correctly as high potential, FN is the number of cases predicted incorrectly as high potential, TN is the number of cases predicted correctly as low potential, and FP is the number of cases predicted incorrectly as low potential. Precision is the value of overall positive real outcomes (high potential outcomes) and is defined as The recall is the proportion of positive cases that are predicted correctly and is calculated as The F-score is the weighted average of P and R and is computed as Finally, the receiver operating characteristic curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system, as its discrimination threshold is varied. It is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The AUC is a measure of the usefulness of a test in general, where a greater area means a more useful test. The AUC is computed as

Results
Weka 3.8.3 software was used to evaluate the predictive capabilities of the used factors and based on the information gain ratio with the average merit using 10-fold crossvalidation (Table 2). Aquifer saturated thickness had the highest average merit (0.31), followed by rainfall (0.238), hydraulic conductivity (0.189), depth to groundwater (0.185), specific yield (0.183), elevation (0.161), geology (0.062), fault density (0.058), drainage density (0.048), soil (0.048), LULC (0.046), and distance to faults (0.038). The average merit values for slope, TWI, and SPI are zero, and thus these factors were excluded because they were unlikely to play a role in regulating groundwater production in this area.
Based on the outcomes of the training process (Table 3)

Results Validation
After training, the test dataset was passed to the algorithms (Table 3 and Figure 10). In terms of overall ACC, all models performed very well, with the best performance by DJ (0.77), and AP (0.75). In terms of recall, the best performing model was BDT (0.81), followed by LD-SVM (0.80), DJ (0.79), and DF (0.77). The F-score was best for DJ (0.86), followed by BDT (0.85) and LD-SVM (0.83). The other models had F-scores between 0.60 and 0.70 and are thus classified as moderate-performance. Finally, all models had AUC > 0.80 (very good to excellent), with DJ and BDT having the highest values (0.94), followed by DF (0.92) and LD-SVM (0.90).
(a) Training and testing revealed that the DJ, BDT, DF, and LD-SVM models performed best in demarcating the groundwater productivity in the study area. The probability values of these models for the training and testing stage were exported to ArcGIS 10.8.1 and interpolated to map groundwater potential based on the predicted probability of 0 (low) to 1 (high). We employed a natural break classification system in the GIS platform to reclassify the probability values into five groups-namely, very low, low, moderate, high, and very high ( Figure 11)-to build a theme zoning map that could be more readily understood by end-users and policymakers. The natural break classification scheme is the most common system used in groundwater potential mapping [5,23,[55][56][57]. We reduced the five groups to three zones (very low-low, moderate, and high-very high) and listed the zonal areas calculated by the best four models in Table 4. Overall, the spatial distributions of groundwater potential zones for the DJ, DF, and LD-SVM models were similar. The very low-low zone occupied 29.3-38% of the study area, mainly in the north and northeast. The moderate zone covered 31.8-33.8% of the study area, mainly in the south. The high-very high zone encompassed 34.0-36.9% of the study area, primarily in the middle. For the BDT model, the groundwater potential zones were somewhat different: the very low-low zone was distributed over large areas in the north, middle, and south (38% of the study area), the moderate zone covered 23% of the study area, and the high-very high zone encompassed 39%. Training and testing revealed that the DJ, BDT, DF, and LD-SVM models performed best in demarcating the groundwater productivity in the study area. The probability values of these models for the training and testing stage were exported to ArcGIS 10.8.1 and interpolated to map groundwater potential based on the predicted probability of 0 (low) to 1 (high). We employed a natural break classification system in the GIS platform to reclassify the probability values into five groups-namely, very low, low, moderate, high, and very high ( Figure 11)-to build a theme zoning map that could be more readily understood by end-users and policymakers. The natural break classification scheme is the most common system used in groundwater potential mapping [5,23,[55][56][57]. We reduced the five groups to three zones (very low-low, moderate, and high-very high) and listed the zonal areas calculated by the best four models in Table 4. Overall, the spatial distributions of groundwater potential zones for the DJ, DF, and LD-SVM models were similar. The very low-low zone occupied 29.3-38% of the study area, mainly in the north and northeast. The moderate zone covered 31.8-33.8% of the study area, mainly in the south. The high-very high zone encompassed 34.0-36.9% of the study area, primarily in the middle. For the BDT model, the groundwater potential zones were somewhat different: the very low-low zone was distributed over large areas in the north, middle, and south (38% of the study area), the moderate zone covered 23% of the study area, and the high-very high zone encompassed 39%.
When it comes to developing strategies to manage the aquifer in the region, the discrepancy in the groundwater potential map between the second-best algorithm (BDT) and the remainder of the algorithms may cause some difficulty for hydrogeologists and decision makers. To reduce this difference (i.e., the uncertainty), the ensemble approach was used to produce three maps: the first map relied on all four best algorithms (i.e., DJ, BDT, DF, and LD-SVM), Figure 12a, while the second map was produced by the ensemble of algorithms that gave a similar pattern to the distribution of groundwater zones in the study area (DJ, DF, and LD-SVM), Figure 12b. The third map was produced by the ensemble of all used models (Figure 12c). When comparing these two maps, it can be seen that the distribution of groundwater potential zones is very similar for the first two maps and slightly different from the third map. The very low-low zones occupied 77.6% (1976 km 2 ) and 77.5% (1974.8 km 2 ) of the study area, the moderate zone occupied 9.9% (251.6 km 2 ) and 10.2% (258.4 km 2 ), and the high-very high zones occupied 12.5% (319.4 km 2 ), for the first and second ensemble models, respectively. In case of the third map (all used models), the very low-low zones were distributed over 72.8% (1852.9 km 2 ) of the study area, the moderate zone covered 13.7 (348.3 km 2 ), and the high-very high zones occupied 13.6% (345.8 km 2 ). Perhaps, the reason for the result similarity is that all algorithms used performed well, especially in the training stage. When using the ensemble approach, the weaknesses of the algorithms that performed fairly low in the testing stage may be mitigated, which could explain why these maps show the same pattern and approximate occupied potential areas.

Source of Result Uncertainty
The uncertainty in the results can be attributed to either the physical model or the method used to integrate the data (i.e., the algorithm) [9]. The physical models here mean the groundwater-influencing occurrence factors. We used both surface and subsurface factors to delineate groundwater potential zones; thus, the maps produced can be used to quantify groundwater potentiality, but only for the shallow groundwater aquifer system in the study area. When attempting to assess the deep aquifers in comparison with the shallow ones, more detailed research should be conducted. Although there is no global agreement on the type and number of factors used in the analysis of groundwater potential, the inclusion of both surface and subsurface factors in this analysis makes it more realistic than using only surface factors, which only reflect the amount of renewable storage of the aquifer.
Another source of uncertainty may have resulted from the use of interpolation techniques, which may be caused by the type of interpolation technique (deterministic or stochastic), the number of samples taken, which is controlled by the nature of the field work, and the financial cost, which may vary depending on the factor used. To reduce the uncertainty in our instance, we used the stochastic kriging technique and its derivative Empirical Bayesian Kriging, which is believed to be more accurate than deterministic models, such as the inverse distance approach. Another significant source of uncertainty is attributed to the ML algorithms, particularly if there is no relationship between the groundwater influential factors and the borehole location target variable. Examining the overfitting (Table 3) revealed that there is no overfitting problem, allowing the findings of training and testing to be used to map the groundwater potential. 021, 13, x FOR PEER REVIEW 24 of 31 Figure 11. Groundwater potential for the best four models.

Source of result uncertainty.
The uncertainty in the results can be attributed to either the physical model or the method used to integrate the data (i.e., the algorithm) [9]. The physical models here mean

Contribution of Groundwater Factors in Developing Groundwater Potential Map
The findings of this research indicate that a combination of subsurface factors (aquifer saturated thickness, aquifer hydraulic conductivity, depth to groundwater, and aquifer specific yield) and surface factors (rainfall and elevation) contribute most to groundwater potential in the study area. Both transmissivity (the product of saturated thickness and hydraulic conductivity) and specific yield have a significant impact on aquifer groundwater storage fluctuation, regulating groundwater productivity and accessible groundwater for extraction. Elevation is a key morphological factor that influences the development of soil and vegetation profiles, which can affect infiltration rates, while rainfall can affect the quantity of runoff generation and percolation through the soil column [35,58]. As the only main source of recharge in this aquifer is rainfall, it is obvious why this factor was more important than other factors used in the analysis of groundwater potentiality. Geology, fault density, drainage density, soil, LULC, and distance to faults were less important (average merits < 0.1) in contributing to groundwater potential in the study area. TWI, SPI, and slope had no influence on the groundwater availability because the study area is nearly flat and there is a relatively smooth slope from northeast to southwest.

Validation of the Models
The results of the present study are in agreement with previous outcomes concerning groundwater potentiality mapping and the usage of machine learning models and their superiority in developing groundwater potential mapping [59][60][61]. In the present study, the DJ model achieved the highest performance for all error measures and was slightly better than BDT in both training and testing stages, followed by DF and LD-SVM. Other algorithms were good in terms of ACC, AUC, and accuracy, but in terms of recall and F-score, they had poor performance. The superiority of BDT in modeling groundwater potential and spring potential is well known [62]. However, to our knowledge, this is the first time the DJ technique has been used to model groundwater potential; therefore, this algorithm is promising in this field, as it has outperformed the master machine learning algorithms such as ANN and SVM, which are known for their predictive abilities.

Distribution of Groundwater Potential Zones
Mapping the probability of groundwater potential in the study area using the DJs, DF, and LD-SVM models showed the same pattern of potentiality distribution, and their results were different from results obtained by BDT. The ensemble technique [9,54] was used to reduce the difference, and three ensemble maps were produced: with all four best performance models, with only similar pattern algorithms, and with all used algorithms. The produced three ensemble maps revealed the same pattern and occupied approximately the same potential areas. The very low-low, moderate, and high-very high zones covered 12.4% (633.3 km 2 ), 10.05 (510 km 2 ), and 77.55% (3950.8 km 2 ) of the study area, respectively.
Overall, the very low-low zones appeared in the areas where the saturated thickness of the aquifer and the hydraulic characteristics had low values compared with other parts of the study area, and this may have been why these portions of the aquifer have low production values, even though these areas receive more rainfall than the middle and southern parts. At the same time, these portions also have a high elevation range and steep slope, and thus, most of the falling rainfall may turn into surface runoff instead of infiltrating to enhance the groundwater storage. On the other hand, high and very high potentiality zones were distributed in the east and west of the middle of the study area. The moderate zone was distributed unevenly between the other zones, and a large part of it was concentrated in the southern part. In general, it can be said that the high potential occupied the middle part of the study area, the moderate zone encompassed the southern part, and the low potential zone covered the northern part. The distribution of potentiality zones was closely related to the subsurface factors that control the ease with which groundwater flows through the aquifer material and the capability of the aquifer material to store groundwater, i.e., the aquifer transmissivity and storativity. These factors in most of the studies related to groundwater potential are neglected because, in most cases, it is difficult or impossible to collect enough data to define the spatial distribution of these thematic layers, especially in poor data countries. Thus, most research depends only on the surficial factors that only determine the renewable storage, which in most cases, especially in arid and semi-arid regions, represents only a small part of the storage available in the aquifer being studied. Therefore, the study of the aquifer itself and its characteristics is very important for mapping the groundwater potential, in addition to the factors that only control the renewable amount of water entering the aquifer system, such as soil, LULC, and topographical-related factors.

Concluding Remarks
In this study, the Microsoft Azure cloud computing service was used for modeling groundwater potential in the Nineveh Plain of northern Iraq using nine machine learning algorithms-namely, ANN, AP, SVM, LD-SVM, BPM, DFs, DJs, BDT, and LG. The following conclusions were drawn from this study. First, the most important factors in groundwater potential are aquifer saturated thickness, followed by rainfall, hydraulic conductivity, depth to groundwater, and specific yield. Most of these are subsurface factors that control the ability of the aquifer to transmit and store water, and it is important to incorporate these factors in developing groundwater potential maps. The other factors that have a role in controlling groundwater potentiality include elevation, geology, fault density, drainage density, soil, and LULC. Second, the DJ and BDT models were the best-performing ML models, followed by DF and LD-SVM. The DJ model, which does not appear to have been used for modeling groundwater potential mapping, outperformed well-known, highly efficient ML algorithms. Third, the high-very high groundwater potential zone in the study area mostly covers the middle part and is associated with high values of aquifer hydraulic characteristics. The ensemble technique is very efficient for reducing the uncertainty in groundwater potential analysis and can be used when the spatial distribution of groundwater potentiality zones produced by the different algorithms is different. Finally, using a cloud computing service offers an advanced environment in which to run and test multiple algorithms for modeling groundwater potential in a fast and inexpensive way.