The Predictive Capability of a Novel Ensemble Tree-Based Algorithm for Assessing Groundwater Potential

: Understanding the potential groundwater resource distribution is critical for sustainable groundwater development, conservation, and management strategies. This study analyzes and maps the groundwater potential in Busan Metropolitan City, South Korea, using random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGB) methods. Fourteen groundwater conditioning factors were evaluated for their contribution to groundwater potential assessment using an elastic net. Curvature, the stream power index, the distance from drainage, lineament density, and fault density were excluded from the subsequent analysis, while nine other factors were used to create groundwater potential maps (GMPs) using the RF, GBM, and XGB models. The accuracy of the resultant GPMs was tested using receiver operating characteristic curves and the seed cell area index, and the results were compared. The analysis showed that the three models used in this study satisfactorily predicted the spatial distribution of groundwater in the study area. In particular, the XGB model showed the highest prediction accuracy (0.818), followed by the GBM (0.802) and the RF models (0.794). The XGB model, which is the most recently developed technique, was found to best contribute to improving the accuracy of the GPMs. These results contribute to the establishment of a sustainable management plan for groundwater resources in the study area.


Introduction
Groundwater quality is typically superior to that of surface water, and is less susceptible to weather changes and contamination from the land surface. Therefore, groundwater is widely used as a secondary water source with a reliable and safe supply of water resources. However, in recent decades, ensuring sustainable groundwater resources has been threatened by an increased demand and the excessive exploitation of groundwater resources that result from population growth and economic development.
Because groundwater resources are not infinite, it is imperative to secure sustainable and high-quality water resources through efficient utilization and systematic management. Groundwater potential maps (GPMs) are useful to establish sustainable groundwater development, conservation, and management strategies [1,2]. These maps show where groundwater can be accessed without expensive and labor-intensive endeavors and provide insights into further groundwater resources. For this reason, it is essential to secure an accurate spatial prediction in potential groundwater resource areas.
To this end, GPMs based on geographic information systems and remote sensing are widely used. In earlier studies, various statistical techniques, such as the frequency ratio, index of entropy, weights of evidence, evidential belief function, and logistic regression, have been applied [3][4][5][6][7][8][9]. These techniques can easily be applied and interpret the correlations between groundwater and groundwater conditioning factors.
Recently, machine learning techniques, a branch of artificial intelligence, have been widely adopted in various fields. These techniques rely on the concept that systems

Study Area
Busan Metropolitan City served as the study area. It is located at the southeastern end of the Korean Peninsula at 128°45′54″-129°18′13″ E and 34°53′12″-35°23′36″ N. In this study, the groundwater potential was analyzed for inland areas, excluding some island areas located at the bottom of Busan Metropolitan City; the study area covers a total area of approximately 747 km 2 ( Figure 1). The eastern part of the study area is hilly, with an elevation of 400-800 m, whereas the western portion consists of low-elevation plains. Busan Metropolitan City is in the temperate monsoon climate zone, and touches the Straits of Korea; thus, it is characterized by an oceanic climate. From 1981 to 2010, the annual average temperature was 14.7 °C, and the average maximum and minimum temperatures were 18.9 °C and 11.3 °C, respectively, showing little difference between summer and winter temperatures. The annual average precipitation was 1519.1 mm, which is considerably higher than the national average (1279.5 mm), and approximately 62% of the annual precipitation is concentrated in May-August [40]. Most of the study area is forested (approximately 45%), while urban and agricultural areas account for approximately 25% and 16%, respectively.
At of the end of 2018, the amount of groundwater used in Busan Metropolitan City was 28,558,796 m 3 /year, which comprises approximately 29% of the exploitable groundwater resources (97,553 thousand m 3 /year). Approximately 76% of the total groundwater use is for municipal water, followed by agricultural use (fisheries) (approximately 14%), other uses (approximately 6%, including bottled water and hot springs water), and industrial use (approximately 4%). The amount of groundwater use per unit area of the city is 37,092 m 3 /year/km 2 , which is higher than that of the nationwide groundwater use per unit area (29,074 m 3 /year/km 2 ) [40].
In the study area as a whole, there is much room for further groundwater development; however, nine out of 16 cities/counties/districts used groundwater in excess of the Busan Metropolitan City is in the temperate monsoon climate zone, and touches the Straits of Korea; thus, it is characterized by an oceanic climate. From 1981 to 2010, the annual average temperature was 14.7 • C, and the average maximum and minimum temperatures were 18.9 • C and 11.3 • C, respectively, showing little difference between summer and winter temperatures. The annual average precipitation was 1519.1 mm, which is considerably higher than the national average (1279.5 mm), and approximately 62% of the annual precipitation is concentrated in May-August [40]. Most of the study area is forested (approximately 45%), while urban and agricultural areas account for approximately 25% and 16%, respectively.
At of the end of 2018, the amount of groundwater used in Busan Metropolitan City was 28,558,796 m 3 /year, which comprises approximately 29% of the exploitable groundwater resources (97,553 thousand m 3 /year). Approximately 76% of the total groundwater use is for municipal water, followed by agricultural use (fisheries) (approximately 14%), other uses (approximately 6%, including bottled water and hot springs water), and industrial use (approximately 4%). The amount of groundwater use per unit area of the city is 37,092 m 3 /year/km 2 , which is higher than that of the nationwide groundwater use per unit area (29,074 m 3 /year/km 2 ) [40].
In the study area as a whole, there is much room for further groundwater development; however, nine out of 16 cities/counties/districts used groundwater in excess of the exploitable groundwater resource capacity. In addition, charges for groundwater use were imposed on the entire Busan Metropolitan City area. Therefore, Busan Metropolitan City is a region with relatively high groundwater dependence, and a management plan should be developed to ensure that groundwater resources are used more efficiently and effectively.

Materials and Methods
The present study was conducted by applying the following steps ( Figure 2): (1) designing a geospatial database that included a groundwater wells feature class and conditioning factors; (2) selecting conditioning factors by use of the feature selection function; (3) generating training and validation datasets for modeling groundwater potential; (4) generating GPMs using RF, GBM, and XGB models; and (5) validation and performance comparison among the produced GPMs. The dataset was prepared with a spatial resolution of 10 m, and the maps were generated using ArcGIS version 10.5 (ESRI, Inc., Redlands, CA, USA). The statistical computations of all three models were conducted using R version 3.5.2 (Foundation for Statistical Computing, Vienna, Austria). exploitable groundwater resource capacity. In addition, charges for groundwater use were imposed on the entire Busan Metropolitan City area. Therefore, Busan Metropolitan City is a region with relatively high groundwater dependence, and a management plan should be developed to ensure that groundwater resources are used more efficiently and effectively.

Materials and Methods
The present study was conducted by applying the following steps ( Figure 2): (1) designing a geospatial database that included a groundwater wells feature class and conditioning factors; (2) selecting conditioning factors by use of the feature selection function; (3) generating training and validation datasets for modeling groundwater potential; (4) generating GPMs using RF, GBM, and XGB models; and (5) validation and performance comparison among the produced GPMs. The dataset was prepared with a spatial resolution of 10 m, and the maps were generated using ArcGIS version 10.5 (ESRI, Inc., Redlands, CA, USA). The statistical computations of all three models were conducted using R version 3.5.2 (Foundation for Statistical Computing, Vienna, Austria).    The groundwater well data used in this study were collected from data obtained through field observations and measurements by Busan Metropolitan City, the groundwater basic survey report, National Groundwater Information Center (gims.go.kr, accessed on 28 January 2020), and Korea Rural Community Corporation. The collected groundwater well data were randomly classified into 153 total datasets, 70% of which were used as the model training datasets (107); the remaining 30% were used as the model validation datasets (46). Figure 1 shows the locations of the groundwater well data used in this study.
To apply machine learning, data in areas without any groundwater wells were needed to analyze the groundwater potential. The geographical data of the area without wells were extracted with an interval of 20 px (100 m). Then, the same number of areas with wells were determined by a random selection method. The value of 0 was assigned to these areas for analysis by using the GBM, RF, and XGB models.
Finally, the training and validation datasets were generated by assigning new values to all groundwater conditioning factors and applying them to the data, including 214 points and 92 points. The training dataset was used to obtain models using RF, XGB, and GBM. The validation dataset was used to verify the modeled outputs.

Groundwater Conditioning Factors
A total of fourteen groundwater conditioning factors were selected based on a literature review of previous studies. These factors were constructed using thematic maps such as topographic digital maps, geological maps, and land cover maps that were obtained from the government and related organizations ( Table 1). Topographic factors such as elevation, slope degree, slope aspect, and curvature were extracted from a digital elevation model (DEM) created using a 1:5000 topographic digital map from the National Geographic Information Institute ( Figure 3). The topography of a region is subject to erosion and sedimentation, which in turn affect the physicochemical properties of the soil, and the concentration and movement of surface water and groundwater; thus, topography is one of the key factors that influences a wide range of environmental variables [41]. properties of the soil, and the concentration and movement of surface water and groundwater; thus, topography is one of the key factors that influences a wide range of environmental variables [41]. Hydrological factors include the topographic wetness index (TWI), the stream power index (SPI), distance from drainage, and drainage density ( Figure 4). Both TWI and SPI are factors used to consider the flow characteristics of surface water and groundwater according to topographic factors and are calculated using the following equations [42]: Hydrological factors include the topographic wetness index (TWI), the stream power index (SPI), distance from drainage, and drainage density ( Figure 4). Both TWI and SPI are factors used to consider the flow characteristics of surface water and groundwater according to topographic factors and are calculated using the following equations [42]: where A s is the specific catchment area, β is the local slope gradient, and α is the local upslope area. In addition, the drainage density is a factor related to permeability and surface runoff. Higher values of drainage density lead to lower values of permeability and higher values of surface runoff. Moreover, drainage density has an indirect effect on the study area's groundwater potential [43]. The drainage density of each cell was calculated by dividing the surface area (km 2 ) by the total length of the drainage (km) in the same cell. The drainage density values were calculated using the line density function, whereas the distances from the drainage were determined using the Euclidean distance function using the ArcGIS 10.5 software. where is the specific catchment area, is the local slope gradient, and is the local upslope area. In addition, the drainage density is a factor related to permeability and surface runoff. Higher values of drainage density lead to lower values of permeability and higher values of surface runoff. Moreover, drainage density has an indirect effect on the study area's groundwater potential [43]. The drainage density of each cell was calculated by dividing the surface area (km 2 ) by the total length of the drainage (km) in the same cell. The drainage density values were calculated using the line density function, whereas the distances from the drainage were determined using the Euclidean distance function using the ArcGIS 10.5 software. For the geological factors of this study, lithology, distance from lineament, lineament density, distance from fault, and fault density were considered ( Figure 5). These factors were acquired from the geologic map (1:50,000) of the Korea Institute of Geoscience and For the geological factors of this study, lithology, distance from lineament, lineament density, distance from fault, and fault density were considered ( Figure 5). These factors were acquired from the geologic map (1:50,000) of the Korea Institute of Geoscience and Mineral Resources. The lithology of the study area was classified into igneous rock, alluvial rock A, alluvial rock B, and metamorphic rock. In this case, alluvial rocks were further categorized according to permeability; alluvial rock A refers to permeable rocks composed of sandstone and gravel, and alluvial rock B refers to non-permeable rocks composed of shale and clay. The geomatica 2016 software (PCI Geomatics, Markham, ON, Canada) was used to extract the lineament by utilizing a hill-shade map that was generated from The land cover was extracted using a 1:25,000 sub-basin land cover map published by the Ministry of Environment ( Figure 6). The sub-basin land cover map divides the land cover into 23 categories using the SPOT-5 image. In this study, these items were reclassified into seven categories: urban, agricultural, forest, grassland, wetlands, bare land, and water. Mineral Resources. The lithology of the study area was classified into igneous rock, alluvial rock A, alluvial rock B, and metamorphic rock. In this case, alluvial rocks were further categorized according to permeability; alluvial rock A refers to permeable rocks composed of sandstone and gravel, and alluvial rock B refers to non-permeable rocks composed of shale and clay. The geomatica 2016 software (PCI Geomatics, Markham, ON, Canada) was used to extract the lineament by utilizing a hill-shade map that was generated from the DEM. The hill-shade map was generated by rendering the images considering a solar zenith angle of 45° and solar azimuth angles of 45°, 90°, and 135°. The land cover was extracted using a 1:25,000 sub-basin land cover map published by the Ministry of Environment ( Figure 6). The sub-basin land cover map divides the land cover into 23 categories using the SPOT-5 image. In this study, these items were reclassified into seven categories: urban, agricultural, forest, grassland, wetlands, bare land, and water.

Selection of Groundwater Conditioning Factors
As many groundwater conditioning factors were considered in this study, selecting the most appropriate variables by reducing redundancy played a key role in the evaluation of groundwater potential. Removing noise in this way improves the model accuracy,  The land cover was extracted using a 1:25,000 sub-basin land cover map published by the Ministry of Environment ( Figure 6). The sub-basin land cover map divides the land cover into 23 categories using the SPOT-5 image. In this study, these items were reclassified into seven categories: urban, agricultural, forest, grassland, wetlands, bare land, and water.

Selection of Groundwater Conditioning Factors
As many groundwater conditioning factors were considered in this study, selecting the most appropriate variables by reducing redundancy played a key role in the evaluation of groundwater potential. Removing noise in this way improves the model accuracy,

Selection of Groundwater Conditioning Factors
As many groundwater conditioning factors were considered in this study, selecting the most appropriate variables by reducing redundancy played a key role in the evaluation of groundwater potential. Removing noise in this way improves the model accuracy, computation speed, and model analysis capability. Therefore, in this study, feature selection was performed prior to model analysis.
In the present study, irrelevant and insignificant variables were selected using an elastic net (Enet). As a regularized regression method, Enet is used to eliminate the ordinary least squares regression method's limitations. In this method, a penalty parameter is used as a regularization parameter, which represents the bias added to the regression coefficient in the equation [44]. The linear regression model that was used in the L1 regularization method is called the least absolute shrinkage and selection operator (LASSO), while the model that used L2 is called a ridge operator [45].
Enet provides a balance between the two, because it uses a combination of the L1 and L2 regularization methods. The L1 regularization method aims to reduce the number of regression coefficients to zero to generate a sparse model. In comparison, the L2 regularization method has no constraint in the number of selected variables and supports the grouping effect; this stabilizes the L1 regularization path [46,47]. Therefore, Enet regression is prominent in the selection of variables due to its flexibility, variable group selection feature, and variables being selected based on their correlation and predictive capability [48].

Random Forest
Ensemble is a method of generating multiple prediction models and combining them to create a final prediction model. RF is a typical example of the ensemble method, which is a tree-based ensemble algorithm that uses the concept of bagging. This unique learning and prediction algorithm is popular as it facilitates the process using random data recursively at each node of the tree and utilizes an error minimization method [49].
RF generates multiple bootstrapped samples using a given training dataset, and then, uses them to create a prediction model by combining them. In this case, bootstrapping is a type of resampling, where a large number of smaller samples of the same size is repeatedly drawn with replacement, thereby ensuring the reliability of the prediction model and contributing to the improvement of prediction performance. Notably, the data not extracted are referred to as "out-of-bag (OOB)" and are used to estimate the prediction error and evaluate the variable importance [50]. The class membership and design of the model (output) are decided by a majority voting process among all trees [51].

Gradient Boost Machine
GBM, also known as gradient-boosted regression tree or gradient tree bosting, is a boosting ensemble algorithm. As in the case of bagging, the boosting algorithm also generates multiple prediction models. In contrast to bagging, the boosting method sequentially generates multiple decision trees using information from previously grown trees. More specifically, in the gradient boosting process applied in this method, a gradient descent algorithm that minimizes the loss for the entire ensemble is used to fit each tree to the residuals of the previous model [26]. The final objective of the method is to minimize the model bias by making the weak learner (each tree) focus on the "harder" samples [52].

Extreme Gradient Boosting
XGB is a boosting ensemble algorithm that can be regarded as an improved GBM. The XGB model integrates several weak learners (each tree) to develop a strong learner through additive learning [53]. Therefore, both XGB and GBM follow the gradient boosting principle. However, XGB improves the training process and prevents overfitting. To this end, XGB implements second-order derivatives to minimize the loss function and obtain more accurate trees, whereas ordinary GBM uses first-order derivatives [25,30]. In XGB, parallel computation is automatically implemented during training to enhance computational efficiency [53]. In addition, it incorporates various regularization features to prevent overfitting. The final prediction of XGB is the sum of the weighted contributions of all decision trees used [25]. Table 2 shows the results of applying the Enet algorithm to the fourteen groundwater conditioning factors used in this study using the "glmnet" package. Enet has two tuning parameters, λ and α, which represent the overall strength of the penalty and the balance between the L1 (LASSO) and L2 (ridge) penalties, respectively [44]. The optimized values of λ and α were 0.05043 and 0.5, respectively, and were obtained using 10-fold cross-validation. The results represent the importance (contribution) of the conditioning factors in predicting groundwater potential. If the conditioning factor does not contribute to the groundwater potential, the regression coefficient is left uncalculated. As a result, five conditioning factors with no values (curvature, SPI, distance from drainage, lineament density, and fault density) were excluded from the further analysis of groundwater potential because these conditioning factors may negatively affect the accuracy of the model.

Groundwater Potential Mapping
The RF, GBM, and XGB models were trained using training datasets to design groundwater potential models. To improve predictivity, each model should be optimized using several hyperparameters. The parameters of each model were optimized using a grid search algorithm based on the "caret" package in the training process. In the present study, GPMs were obtained using RF, GBM, and XGB models based on these optimized hyperparameters.
Finally, the GPMs were reclassified into five possible groups using the natural breaks classification method with characteristics of natural grouping inherent to the data. All of the GPMs had a different range of groundwater potential index (GPI) values. In addition, there is no specific standard to set a threshold for potential classes. Thus, we used the natural breaks classification method that automatically determines thresholds according to the principle of reducing the variance within the classes and maximize the variance between classes [54].

Random Forest
In the RF model, the "randomForest" package was used to obtain the groundwater potential model. The parameter of ntree denotes the number of trees in the forest, whereas the parameter of mtry denotes the number of variables tested at each node. The parameters of ntree and mtry were optimized at the values of 370 and 3, respectively.

Extreme Gradient Boosting
The XGB model optimizes the greatest number of a priori parameters (six) compared to other models. The XGB model optimized the parameters eta (learning rate) = 0.01, max_depth (maximum depth of a tree) = 8, min_child_weight (minimum sum of instance weight) = 1, subsample (subsample ratio of the training instance) = 0.7, colsample_bytree (subsample ratio of columns) = 0.6, and gamma = 0.3, using the "caret" package; the groundwater model was generated using the "XGBoost" package.
The results of this model indicate that the GPI ranged between 0.31-0.70. As for the other models, these values were reclassified into five groups based on their potential levels: very low (0.31-0.39), low (0.39-0.46), moderate (0.46-0.52), high (0.52-0.60), and very high (0.60-0.70). The distribution of GPI values for each potential class showed similarity with that of the GPM generated using the RF model. While the area percentages of the low (26.32%) and high (14.74%) potential classes were lower than those calculated in the GPM generated by RF, the area percentages of other classes were found to be slightly higher (Figure 7c, Table 3).    1 Class range of each groundwater potential index produced by the respective model; 2 Area ratio of each class to the total area of the study area.

Gradient Boosting Machine
In the GBM model, a priori parameters of n.minobsinnode, interaction.depth, bag.fraction, and shrinkage were optimized utilizing the "caret" package. Then, the groundwater model was obtained using the "gbm" package. In this model, n.minobsinnode denotes the minimum number of observations in a terminal node, interaction.depth denotes the maximum depth of a tree, bag.fraction denotes the subsampling fraction, and shrinkage denotes the learning rate. The optimization values for the parameters of n.minobsinnode, interaction.depth, bag.fraction, and shrinkage were found to be 8, 8, 0.9, and 0.01, respectively.
According to the results of this model, the GPM was generated by reclassifying the areas into the same five potential groups used to classify the RF model. The areas of very low potential comprised 34.66%, whereas areas of low, moderate, high, and very high potential comprised 21.04%, 14.76%, 14.15%, and 15.39%, respectively. The area percentage of the very high potential class in the GBM GPM was found to be higher than that in the GPMs generated by the RF and XGB models (Figure 7b, Table 3).

Extreme Gradient Boosting
The XGB model optimizes the greatest number of a priori parameters (six) compared to other models. The XGB model optimized the parameters eta (learning rate) = 0.01, max_depth (maximum depth of a tree) = 8, min_child_weight (minimum sum of instance weight) = 1, subsample (subsample ratio of the training instance) = 0.7, colsample_bytree (subsample ratio of columns) = 0.6, and gamma = 0.3, using the "caret" package; the groundwater model was generated using the "XGBoost" package.
The results of this model indicate that the GPI ranged between 0.31-0.70. As for the other models, these values were reclassified into five groups based on their potential levels: very low (0.31-0.39), low (0.39-0.46), moderate (0.46-0.52), high (0.52-0.60), and very high (0.60-0.70). The distribution of GPI values for each potential class showed similarity with that of the GPM generated using the RF model. While the area percentages of the low (26.32%) and high (14.74%) potential classes were lower than those calculated in the GPM generated by RF, the area percentages of other classes were found to be slightly higher (Figure 7c, Table 3).

Model Validation and Comparison
The performance and predictive capacity were analyzed using the ROC curve. The ROC curve was plotted using the true-positive rate (sensitivity) and false-positive rate (1-specificity). The area under the ROC curve (AUROC) was found to range between 0.5 and 1.0. A model's classification accuracy is considered to be high when the AUROC is >0.8. The AUROC values were analyzed for each GPM to determine their success rates; all were found to be > 0.966. According to these results, all GPMs fit the data very well (Table 4, Figure 8a). Moreover, the AUROC values of the predictivity were analyzed using the validation dataset. According to the results, the XGB model provided the highest value (0.818), followed by the GBM model (0.802,) and the RF model (0.794) (Table 4, Figure 8b). These results reveal that the XGB model was the best groundwater prediction model among the three models.

Model Validation and Comparison
The performance and predictive capacity were analyzed using the ROC curve. The ROC curve was plotted using the true-positive rate (sensitivity) and false-positive rate (1specificity). The area under the ROC curve (AUROC) was found to range between 0.5 and 1.0. A model's classification accuracy is considered to be high when the AUROC is > 0.8. The AUROC values were analyzed for each GPM to determine their success rates; all were found to be > 0.966. According to these results, all GPMs fit the data very well (Table 4, Figure 8a). Moreover, the AUROC values of the predictivity were analyzed using the validation dataset. According to the results, the XGB model provided the highest value (0.818), followed by the GBM model (0.802,) and the RF model (0.794) (Table 4, Figure 8b). These results reveal that the XGB model was the best groundwater prediction model among the three models.  In addition, the GPMs produced from the three models were validated based on SCAI. SCAI is the ratio of the percentage of groundwater pixels to the percentage of all pixels for each potential class of the GPMs. Generally, SCAI values gradually decrease from very low to very high [55,56], similar to the results of this study (Table 5). At the "very high" class, the SCAI values of the RF, GBM, and XGB models were 0.09, 0.12, and 0.12, respectively. Overall, the models used in this study were suitable for modeling groundwater potential. In addition, the GPMs produced from the three models were validated based on SCAI. SCAI is the ratio of the percentage of groundwater pixels to the percentage of all pixels for each potential class of the GPMs. Generally, SCAI values gradually decrease from very low to very high [55,56], similar to the results of this study (Table 5). At the "very high" class, the SCAI values of the RF, GBM, and XGB models were 0.09, 0.12, and 0.12, respectively. Overall, the models used in this study were suitable for modeling groundwater potential.

Discussion
This study was aimed to produce GPMs using the RF, GBM, and XGB models and compare prediction performances of groundwater potential. The main concern as to test the robustness und accuracy of the XGB model. The AUROC values of the all success rate curves were relatively high, almost reaching a value of 1. Contrastingly, the AUROC values of the prediction rate curves decreased by approximately 20%. In particular, the prediction performance of the XGB model was superior to that of the RF and GBM models. In addition, the difference between the success rate and prediction rate curves of the GPM generated by the XGB model was found to be 0.148. This was the lowest value calculated among the three models; the GBM model performed second best (0.196), followed by the RF model (0.206).
These results present the same trend as the findings of previous studies. As mentioned in introduction, the XGB model has been applied to various fields, but hardly to groundwater research. Recent studies used the XGB model to predict groundwater level [57,58], analyze groundwater salinity [59], and assess groundwater quality [60]. In these studies, the XGB model had more accurate results compared to ANN, SVM, and multiple linear regression methods. Especially, Naghibi et al. [61] assessed the groundwater spring potential using RF, parallel random forest, and XGB models. All models yielded AUROC values of approximately 86%, with the XGB model showing the highest values.
The superior results are likely due to the advantages of the XGB model. The XGB model, which improves the loss function by Taylor expansion, provides gradient convergence quickly and more accurately as compared to other methods. This model can handle missing/sparse data and contribute to a higher speed/accuracy using cross-validation, early stop, and parallel processes [31,39,62]. In addition, this model effectively avoids overfitting using many strategies, such as the normalization of the objective function, shrinkage, and column subsampling [39]. Through these advantages, the XGB model ensures excellent prediction performance, applying not only to classification, but also to linear regressions.
The GPMs proposed in this study can be used to establish a sustainable groundwater usage and management policy for the study area. For example, we compared the groundwater potential areas and annual groundwater usage in each administrative district ( Figure 9). The groundwater potential areas were calculated via the ratio of the high and very high potential classes from the GPM generated by the XGB model to each administrative district's total area. The annual groundwater usage, as of June 2020, was obtained from the Public Data Portal in Korea [63] and calculated as the percentage of the entire study area.
Among the administrative districts, the annual groundwater usage in Haeundae, Geumjeong, and Gijang accounted for more than 10% of the total groundwater usage in the study area. However, the potential area ratio was relatively low in these districts as compared to others. Therefore, these regions need efficient policy for a sustainable groundwater use. The GPM, as reference data, could help to establish such policy and support decision-making processes. In the future, the accuracy of the GPM can be further improved by examining a wider range of groundwater conditioning factors for the study area, such as hydraulic and hydrological factors, topography, pedological factors, and precipitation. umjeong, and Gijang accounted for more than 10% of the total groundwater usage in the study area. However, the potential area ratio was relatively low in these districts as compared to others. Therefore, these regions need efficient policy for a sustainable groundwater use. The GPM, as reference data, could help to establish such policy and support decision-making processes. In the future, the accuracy of the GPM can be further improved by examining a wider range of groundwater conditioning factors for the study area, such as hydraulic and hydrological factors, topography, pedological factors, and precipitation.
(a) (b) Figure 9. Area ratio and usage rate of each administrative district: (a) distribution map (graduated colors indicate the area ratio, and bar charts indicate the usage rate), and (b) specific data (area ratio indicates the area ratio of the high and very high potential classes to the total area of each administrative district; usage rate indicates the percentage of annual groundwater usage (m 3 /year) to that in the total study area.

Conclusions
In this study, GPMs for Busan Metropolitan City, South Korea, were created using RF, GBM, and EGB models; the results were comparatively analyzed using AUROC and SCAI, for which a spatial database of groundwater wells and groundwater conditioning factors was constructed. Fourteen groundwater conditioning factors were obtained from thematic maps obtained from the government and related organizations. These factors were evaluated for their contributions to groundwater potential using Enet. Finally, nine groundwater conditioning factors were selected, including altitude, slope degree, slope aspect, TWI, drainage density, lithology, distance from lineament, distance from fault, and land cover. Groundwater potential analyses and mapping were performed with these nine groundwater conditioning factors using the RF, GBM, and EGB models. The GPMs produced by RF, GBM, and EGB were evaluated using AUROC and SCAI. Overall, the three GPMs exhibited good performance with the used training and validation datasets. These results indicate that all the studied models were successful in creating GPMs for the Figure 9. Area ratio and usage rate of each administrative district: (a) distribution map (graduated colors indicate the area ratio, and bar charts indicate the usage rate), and (b) specific data (area ratio indicates the area ratio of the high and very high potential classes to the total area of each administrative district; usage rate indicates the percentage of annual groundwater usage (m 3 /year) to that in the total study area.

Conclusions
In this study, GPMs for Busan Metropolitan City, South Korea, were created using RF, GBM, and EGB models; the results were comparatively analyzed using AUROC and SCAI, for which a spatial database of groundwater wells and groundwater conditioning factors was constructed. Fourteen groundwater conditioning factors were obtained from thematic maps obtained from the government and related organizations. These factors were evaluated for their contributions to groundwater potential using Enet. Finally, nine groundwater conditioning factors were selected, including altitude, slope degree, slope aspect, TWI, drainage density, lithology, distance from lineament, distance from fault, and land cover. Groundwater potential analyses and mapping were performed with these nine groundwater conditioning factors using the RF, GBM, and EGB models. The GPMs produced by RF, GBM, and EGB were evaluated using AUROC and SCAI. Overall, the three GPMs exhibited good performance with the used training and validation datasets. These results indicate that all the studied models were successful in creating GPMs for the study area. Since the EGB model outperformed the RF and GBM models, it can improve predictive performance with its powerful capability. It is suggested that the GPM generated by this model in the present study can be used efficiently and cost-effectively to investigate groundwater resources in Busan Metropolitan City. It can further be used to prepare groundwater management plans for a sustainable use of groundwater resources.
Author Contributions: S.P. wrote the paper and analyzed the data; J.K. suggested the idea for the study. All authors have read and agreed to the published version of the manuscript.