Application of Support Vector Regression and Metaheuristic Optimization Algorithms for Groundwater Potential Mapping in Gangneung-si, South Korea

: The availability of groundwater is of concern. The demand for groundwater in Korea increased by more than 100% during the period 1994–2014. This problem will increase with population growth. Thus, a reliable groundwater analysis model for regional scale studies is needed. This study used the geographical information system (GIS) data and machine learning to map groundwater potential in Gangneung-si, South Korea. A spatial correlation performed using the frequency ratio was applied to determine the relationships between groundwater productivity (transmissivity data from 285 wells) and various factors. This study used four topography factors, four hydrological factors, and three geological factors, along with the normalized difference wetness index and land use and soil type. Support vector regression (SVR) and metaheuristic optimization algorithms—namely, grey wolf optimization (GWO), and particle swarm optimization (PSO), were used in the construction of the groundwater potential map. Model validation based on the area under the receiver operating curve (AUC) was used to determine model accuracy. The AUC values of groundwater potential maps made using the SVR, SVR_GWO, and SVR_PSO algorithms were 0.803, 0.878, and 0.814, respectively. Thus, the application of optimization algorithms increased model accuracy compared to the standard SVR algorithm. The ﬁndings of this study improve our understanding of groundwater potential in a given area and could be useful for policymakers aiming to manage water resources in the future. other factors such as sea water intrusion. Besides, low probability (brown) areas are in the western part of the study area, which is characterized by rock and forest cover. Also, the characterized of western part has a weak spatial relationship with groundwater presence than its affect to the mapping result. A validation step was conducted to assess the reliability of the groundwater potential map from each algorithm. The accuracy of the groundwater potential maps generated using the three algorithms was then evaluated based on ROC curve analysis of the testing dataset (30% of all data). The AUC values were 0.803, 0.878, and 0.814 for SVR, SVR_GWO and SVR_PSO, respectively (Figure 5). The results imply that the algorithms are useful for distinguishing high groundwater potential areas.


Introduction
Water resources are an essential aspect for living in this world, including surface water and groundwater, and are recycled through evaporation, precipitation and surface runoff. Recent climate change projections point to increased spatial and temporal heterogeneity in the water cycle, which would lead to water demand outstripping supply [1]. Over the next several decades, demand for water resources, including groundwater, is expected to increase significantly [2][3][4]. Groundwater is defined as water in a saturated area which fills in the pore spaces between mineral grains or cracks and fractured rocks in a rock mass [5]. The World Economic Forum has stated that water shortages will be a global problem in the future. Currently, approximately 20% of water consumed by humans is derived from groundwater, and this proportion is projected to increase over the next several decades [6,7]. Climatic conditions also have an important role in groundwater availability, affecting both spill patterns and runoff time [2]. The widespread use of groundwater in industry, agriculture and everyday life presents challenges related to its management. With the existence of climate anomalies and increasing population development, it will cause a shortage of water resources in various regions.
In Korea, groundwater demand increased by more than 100% from 1994 to 2014 [8]. The increasing demand for high-quality water, coupled with the anticipated pressure of global climate change, urgently requires quantitative methods for assessing groundwater production in aquifers. However, due to this costly and time-consuming method [5], a reliable approach for assessing aquifer productivity has not been well established. Therefore, the development of a reasonable groundwater potential model is essential for future system development, effective management, and sustainable use of groundwater resources [9]. Also, a reliable groundwater analysis model is needed to facilitate resource management and identify additional water sources worldwide.
In general, groundwater exploration programs are mainly based on hydrological tests, field surveys and geophysical methods [4,10]. However, these lines of action are timeconsuming, costly, and demand experienced workers [11]. Meanwhile, field exploration based on hydrological or geophysical resistivity surveys cannot always represent factors that influence groundwater conditions and movements [12]. Therefore, it is necessary to develop groundwater analysis methods that can clarify the hydrological relationship with groundwater, such as the use of data specific capacity, transmissivity and yield [13].
A careful study of the literature shows that groundwater potential maps have different meanings to different authors. In general, groundwater potential is defined as optimal zones for groundwater development or how likely groundwater is to be present [14]. Groundwater potential mapping estimates the probability of groundwater occurrence in a given area. In general, this mapping involves statistical analysis of various types of field data. Remote sensing (RS) data collection processes enhance spatial coverage which increasing the type and availability of data [15][16][17][18]. Geographical information system (GIS) technology can be used to assess large areas in a more cost-efficient manner [12,19]. The development of GIS technology has increased in recent years as various spatial modeling techniques having been introduced to evaluate groundwater. Preliminary GIS studies using machine learning and statistical methods can be useful to analyze groundwater availability according to topographical and geographical factors, among others [7]. Thus, potential groundwater wells can be mapped to facilitate groundwater detection more efficiently.
Along with the increasing number and complexity of data in GIS, a reliable model was required to help solve those problems. Various models have been proposed to assess groundwater potential, including frequency ratio (FR) [20,21], weight of evidence (WoE) [22,23] and evidential belief function (EBF) [24,25] models, as well as machine learning models such as artificial neural network (ANN) [26], random forest (RF) [27,28], logistic regression [29,30], and support vector machine (SVM) [13,31] models. Some of these studies still have limitations in their predictions which are influenced by the accuracy of the data set and the internal structure of the model [32]. In addition, the computational process using large data sets and different ranges of validation and training values is a weakness of artificial neural networks [33]. In the context of groundwater mapping, several studies still use indirect indicators such as yield, resistance and spring location compared to hydraulic constants such as transmissivity and specific capacity.
In this study, the groundwater potential mapping was constructed and analyzed based machine learning approach using support vector regression (SVR) with training and datasets was derived from hydraulic dataset of transmissivity. SVR a variant of SVM models have also been developed and widely used for GIS mapping [34,35]. The principles of SVR and SVM are similar, although SVR has additional parameter settings [36]. Parameter values of the SVR can influence the learning process and reliability of model results. Therefore, determining operational parameters becomes a challenge in the process of getting the expected results.
In order to overcome the determining parameter challenges, meta-heuristic optimization algorithms were used in this research, including grey wolf optimization (GWO) [35,37] Remote Sens. 2021, 13, 1196 3 of 23 and particle swarm optimization (PSO) [38,39]. GWO is an optimization algorithm that has been widely used in the process of optimizing models in GIS applications [40]. It has been widely tailored for a wide variety of optimization problems due to its impressive characteristics over other swarm intelligence methods: it has very few parameters, and no derivation information is required in the initial search [41]. Likewise, PSO is considered reliable in the computational process for model optimization which provides a soft computational and quick convergence [39]. It is hoped that the process of using the optimization algorithm can improve the performance of the groundwater model. The meta-heuristic optimization algorithm can increase the accuracy of model predictions by tuning SVR machine learning parameters.
However, the use of SVR and meta-heuristic algorithms is still rarely used in groundwater mapping research. Therefore, in this study, the application of the hybrid algorithm for mapping groundwater potential was carried out. Combined with the use of hydraulic datasets derived from transmissivity as direct indicators to get better results. In this study had the advantage of comparing the accuracy and reliability of SVR and SVR optimization models for groundwater analysis based on the area under the curve (AUC). The AUC denotes the accuracy probability of groundwater occurrence. The FR describes the spatial relationships between dependent and independent variables in the context of groundwater potential; factors are ranked for ease of interpretation. The factors used in this study were obtained from RS data. The results could serve as a reference for policymakers aiming to manage water resources, and for future machine learning-based groundwater potential mapping.

Study Area
The study area was the Gangneung-si area of Gangwon-do Province, located on the east coast of the Korean Peninsula at 37 • 45 N, 128 • 54 E. The total coastline length is approximately 73.72 km. With a population of 213,199 people and a population density of 205/km 2 , Gangneung is one of the three largest cities in Gangwon-do Province [42]. A Sentinel-2 optical image of the study area is shown in Figure 1. Gangneung has warmer weather in summer and colder weather in winter than other areas. The months with the highest and lowest average temperatures are July (36.1 • C) and January (−1.8 • C), respectively. The mean annual precipitation in Gangneung is 1320.3 mm/year, with 753.4 mm occurring in summer [43]. The geologic distribution in Gangneung-si consists mainly of Jurassic granite, followed by Precambrian and Triassic age sedimentary rocks around the lower coast [44]. In addition, alluvial deposits are distributed along rivers and tributary streams. The aquifer of the study area is dominantly alluvium, with sedimentary rock in the lowlands [43].
In general, the groundwater in Gangneung-si comes from several sources, including rivers and dams. The available capacity is 1,650,000 m 3 /year. Groundwater in this area is The geologic distribution in Gangneung-si consists mainly of Jurassic granite, followed by Precambrian and Triassic age sedimentary rocks around the lower coast [44]. In addition, Remote Sens. 2021, 13, 1196 4 of 23 alluvial deposits are distributed along rivers and tributary streams. The aquifer of the study area is dominantly alluvium, with sedimentary rock in the lowlands [43].
In general, the groundwater in Gangneung-si comes from several sources, including rivers and dams. The available capacity is 1,650,000 m 3 /year. Groundwater in this area is mainly used for agriculture (71.6%), followed by domestic use (28.4%) [45]. The land use distribution of the study area is 80.4% forest, followed by agricultural land and urban areas. Due to the projected increase in groundwater demand in the area, groundwater potential assessment is needed.

Groundwater Datasets
Groundwater productivity data was calculated based on groundwater transmissivity (T) data obtained from 285 wells in Gangneung-si. T values above the median were included in the inventory dataset used for groundwater potential analysis. Groundwater pumping data were obtained from national-and local government-level groundwater surveys conducted by the Korea Water Resource Corporation (K-Water) [46].
T represents the flow rate under a unit hydraulic gradient through a unit width of a particular thickness's aquifer. It is the product of the average hydraulic conductivity and the thickness of the aquifer formation, and can be calculated as follows: where T is transmissivity, K is hydraulic conductivity, and b is aquifer thickness. A decrease in the drawdown and a thicker aquifer produce higher T values. By combining Equation (2) and Darcy's law, we can calculate the amount of water flowing in aquifer units [47].
In this study, the T data were applied to the FR and machine learning models, and used to examine various aspects of groundwater. Groundwater transmissivity data (T) obtained from well locations as mentioned above. In order to apply FR and machine learning models, groundwater productivity data were converted into binary form which is 0 and 1 [48]. The split criteria was the median of groundwater transmissivity, where the value above the transmissivity is designated as "1", and the other data is expressed as "0" [12]. The groundwater productivity data were randomly extracted with their statistical attributes and divide into a data set with 285 points, where half of the well points meeting the criterion in each dataset for training and testing. The groundwater productivity data were randomly separated into training (70%) and testing (30%) datasets, of which proportions used are typical of machine learning studies [49,50]. T data from 199 and 86 wells were included in the training and testing datasets, respectively.

Selection of Groundwater-Related Factors
It is important to select the most relevant conditioning factors for groundwater potential mapping. Topographic, geologic, and hydrologic factors can affect the probability of groundwater occurrence, while factors such as land use, climate, and vegetation affect groundwater recharge capacity and demand [51]. Based on a literature review, 13 factors related to groundwater were selected and classified as topographical factors (slope, slope height, elevation, topographic wetness index [TWI]), hydrological factors (precipitation, LS-factor, and water density) [15,31,52]. Moreover, geological factors (lithology, distance to fault, lineament density), land use, soil type, and the normalized difference wetness index (NDWI) are considered in this study [53,54]. However, the relations among the factors with groundwater have not been verified either statistically or quantitatively. In this study, the 13 groundwater related-factors were reviewed with regard to transmissivity using FR. Afterward, the 13 factors were selected and applied to the SVR algorithm to generate the groundwater probability map. The 13 factors were derived from the RS images shown in Figure 2.
Thematic maps (elevation, slope, slope height, LS factor, and TWI) were derived from a digital elevation model (DEM) constructed from digital topographic maps (scale 1:5000) provided by the National Geography Information Institute (NGII) in 2015. We generated the thematic map by using the GIS application. The topographic maps were constructed based on ground control points from digital aerial photographs and ground surveys. Further calibration of the topographic maps was performed based on field surveys. All data were resampled to a pixel size of 30 × 30 m. Topography is important for runoff generation and retention, and water concentrations [55].
Elevation in the study area is denoted by contour lines and is associated with climate, soil and vegetation conditions [56]. In elevated regions, runoff conditions are higher and infiltration is low, even though precipitation is higher [16]. Bare areas have slower runoff, and infiltration leads to groundwater recharge. The elevation map used in this research is shown in Figure 2a.
Sens. 2021, 13, x FOR PEER REVIEW 5 of 24 the criterion in each dataset for training and testing. The groundwater productivity data were randomly separated into training (70%) and testing (30%) datasets, of which proportions used are typical of machine learning studies [49,50]. T data from 199 and 86 wells were included in the training and testing datasets, respectively.

Selection of Groundwater-Related Factors
It is important to select the most relevant conditioning factors for groundwater potential mapping. Topographic, geologic, and hydrologic factors can affect the probability of groundwater occurrence, while factors such as land use, climate, and vegetation affect groundwater recharge capacity and demand [51]. Based on a literature review, 13 factors related to groundwater were selected and classified as topographical factors (slope, slope height, elevation, topographic wetness index [TWI]), hydrological factors (precipitation, LS-factor, and water density) [15,31,52]. Moreover, geological factors (lithology, distance to fault, lineament density), land use, soil type, and the normalized difference wetness index (NDWI) are considered in this study [53,54]. However, the relations among the factors with groundwater have not been verified either statistically or quantitatively. In this Thematic maps (elevation, slope, slope height, LS factor, and TWI) were derived from a digital elevation model (DEM) constructed from digital topographic maps (scale 1:5000) provided by the National Geography Information Institute (NGII) in 2015. We generated the thematic map by using the GIS application. The topographic maps were constructed based on ground control points from digital aerial photographs and ground surveys. Further calibration of the topographic maps was performed based on field surveys. All data were resampled to a pixel size of 30 × 30 m. Topography is important for runoff generation and retention, and water concentrations [55].
Elevation in the study area is denoted by contour lines and is associated with climate, soil and vegetation conditions [56]. In elevated regions, runoff conditions are higher and infiltration is low, even though precipitation is higher [16]. Bare areas have slower runoff, and infiltration leads to groundwater recharge. The elevation map used in this research is shown in Figure 2a. The infiltration rate is often analyzed when mapping groundwater potential, and is influenced by slope conditions. On a steep slope, the water flow rate is high, which reduces the infiltration rate in the recharge zone [57,58]. Slope height is related to flow rate and can affect slope stability, which lead to the runoff rate. Gentle slope means slower runoff and therefore more time for infiltration. Conversely, steep slope means more erosion and shorter residence time. In groundwater potential mapping, this is usually related to the low probability of unconsolidated sediment accumulation and recharge [55]. GIS data were analyzed to determine slope variation in the study area and its effect on groundwater potential, as shown in Figure 2b,c.
The TWI, used widely in groundwater potential mapping studies, is shown in Figure 2d. The TWI provides information on the spatial distribution of hydrological variables such as infiltration potential or soil moisture. Also, the TWI can reflect the relation between water accumulation on any point area and the gravitational force that drives water down slope [59]. The higher index represents a lower slope and larger area, which provide a positive correlation between groundwater occurrence and TWI [5].
The relationship of LS with groundwater was determined to know the water behavior that leads to groundwater potential, as shown in Figure 2e. Precipitation data are crucial for mapping groundwater potential, and can help reveal the relationship between precipitation and groundwater recharge [60]. The precipitation map in this study was based on average precipitation data obtained from 103 weather stations in South Korea, for the period 2015-2020, by the Korea Meteorological Administration (KMA). We created a spatial map of precipitation-shown in Figure 2f-by using inverse distance weighting (IDW) interpolation.
Water density controlled by different litho-unit, structure and morphology of an area and helps to assess the characteristics of runoff and groundwater infiltration [61]. Water density is related to permeability and surface runoff, which affect groundwater potential. The high-water density means that the runoff can be emptied quickly and the possibility of infiltration is less. However, there are some exceptions because groundwater is expected to accumulate in alluvial sediments in flat areas of the watershed [62]. The water density map for this study is shown in Figure 2g. The NDWI evaluates wetness based on green and near-infrared bands. The use of the NDWI factor can describe the condition of soil moisture related to the humidity by the vegetation cover. This factor can reflect the evapotransport conditions, as well as the infiltration rate associated with the water table conditions. Even so, the annual average precipitation accumulation factor is used to complement the consideration of the potential presence of groundwater [53,63]. In this research, the NDWI of Gangneung-si ( Figure 2h) was process from Landsat-8 OLI data in 2019.
Groundwater is closely associated with the landscape and land use; the landscape is affected by anthropogenic activities [64]. Land use affects groundwater resources by influencing recharge and water demand [51]. Land use is a significant factor affecting the groundwater recharge process, as it influences evapotranspiration, runoff and recharge of the groundwater system [65]. Kompsat-2 and 3 satellite images were used to reconstruct the 1:50,000 scale land use map in 2012 of the Ministry of Environment [66]. We divided the land use types into seven categories: urbanized area, agricultural area, forest area, grassland area, marsh area, bare ground area, and water area (Figure 2i). Soil type is often used in groundwater recharge and evaluation studies. This is mainly because soil permeability is directly related to effective porosity, particle shape and size, and porosity, which means that soil type plays an important role in infiltration [67,68]. For this reason, studies that consider soil-related variables are often carried out in conjunction with rainfall and/or recharge [69].The soil map used in this study (Figure 2j) was constructed on a 1:25,000 scale map issued by the Rural Development Administration (RDA), the data was constructed by field research by RDA in 2007.
Lithology is an important factor in the occurrence and distribution of groundwater, and provides information about the water stored in a particular area [64]. Lithology can affect groundwater recharge via its influence on water percolation [50]. The lithology map used in this study was created based on the 1:50,000 scale geological maps of the Korea Institute of Geoscience and Mineral Resources (KIGAM), which was provided in 2015. The lithology of the study area was divided into granite, gneiss, sedimentary rock, alluvium, and limestone categories, as shown in Figure 2k. The occurrence and movement of groundwater are controlled by porosity, permeability, aquifer layout, horizontal distribution, and recharge area. In the east part of the study area, both the Quaternary and Tertiary aquifers have major porosity, which is conducive to the formation of potential groundwater in the coastal aquifers [70].
Lineaments are generally described in studies analyzing fractures, and provide information on the linear properties of geological structures [71]. The distribution of lineaments is related to the location of groundwater, due to its association with the presence of faults, fractures, and joints, all of which impact porosity and permeability [72]. The relationship between lineament density and groundwater productivity can be derived for a given area; the lineament map of our study area is shown in Figure 2l. A fault is a geological structure that should be considered when mapping groundwater potential. Faults influence groundwater potential in a given area. The distribution of faults is related to groundwater storage potential, which in turn affects the groundwater recharge rate [31]. The fault map of this study is displayed in Figure 2m.

Methodology
To map groundwater potential using a machine learning algorithm, several steps must be performed, as described below: First, the spatial correlations between the presence of groundwater and related factors (topographical, geological, and hydrological factors) are calculated using the FR. In this research, we analyzed the spatial correlation between groundwater well locations (T points) and 13 factors related to groundwater productivity, based on the FR of each factor. We obtained the spatial correlations between groundwater and related factors by calculating the ratio between the groundwater productivity area and the whole study area. The FR for each factor was calculated by the following Equation (3) [73].
When the FR is >1, the spatial correlation of a particular class of a given factor with groundwater is stronger, and vice versa for lower FR values [74]. FR results can be used as a reference for groundwater potential mapping [75].
To apply the machine learning algorithm, T data are converted to binary form, then partitioned randomly to training data (70%) and testing data (30%). The groundwater related factor maps were produced at 30 m resolution. First, the value of T is determined, and included as an independent variable in the training dataset. Then, all data are classified as categorical or continuous. Continuous variables include the TWI, LS factor, water density, lineament density, slope, distance to fault, and elevation. Categorical variables include lithology, soil type, and land use.
In this study, a metaheuristic optimization algorithm (GWO and PSO) was used to determine the operational parameters of the SVR based on a prepared dataset (factors). Then, the SVR machine learning algorithm was used to create a groundwater probability model in Gangneung. In order to validate the mapping results, 285 T data points were randomly divided into training and testing datasets, as discussed above. Model validation was carried out through ROC curve analysis of the testing dataset (30%). Receiver operating characteristic (ROC) curve analysis, as an index of model performance, is commonly used to assess predictive accuracy [76]. To quantitatively determine the accuracy of the model verification, the area under the curve (AUC) of the ROC curve is calculated for the total area and correct predictive accuracy is obtained. AUC values ranges between 0.5 and 1; higher values indicate more reliable algorithm performance. The workflow of the groundwater potential mapping carried out in this study is provided in Figure 3.

Support Vector Regression
SVR is based on the SVM algorithm rule, as stated previously [77]. SVR is one of the most widely used supervised classification methods uses for regression and classification problems because of its ability to universally approximate the multivariate task at any degree of accuracy [78]. In regression analysis, the correlation, or nonlinear mapping characteristic f (x), of the input and output of the learner is acquired. The SVR seeks to generate a "nonlinear mapping characteristic" to map the training data {x i , y i , ; i = 1, . . . , n} to an excessively high dimensional characteristic space. The nonlinear mapping of the input and output of the learner can be defined in Equation (4) [79]: where w and b are the coefficients to be adjusted. The empirical risk can be defined as Equation (5): where Θ ε is the insensitive loss function, which can be calculated using Equation (6): This equation is used to acquire the best hyperplane for dividing the training data into subsets with the optimal separation distance. SVR is an optimizing problem with the following goal function: where C is the trade-off between the first and second equation terms. In this equation the large weights can be adjusted by maximizing the distance between data points. The loss feature (ε) is used to divide the training errors between f (x) and y. The constraints of this optimization problem are shown in Equation (8): As noted previously, the SVR parameters affect the accuracy of model predictions. Hence, it is essential to select suitable parameters. In this study, the metaheuristic optimizing algorithm was used to determine the SVR parameters.
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 24 1; higher values indicate more reliable algorithm performance. The workflow of the groundwater potential mapping carried out in this study is provided in Figure 3.

Support Vector Regression
SVR is based on the SVM algorithm rule, as stated previously [77]. SVR is one of the most widely used supervised classification methods uses for regression and classification problems because of its ability to universally approximate the multivariate task at any degree of accuracy [78]. In regression analysis, the correlation, or nonlinear mapping characteristic ( ), of the input and output of the learner is acquired. The SVR seeks to generate a "nonlinear mapping characteristic" to map the training data , , ; = 1, … , to an excessively high dimensional characteristic space. The nonlinear mapping of the input and output of the learner can be defined in Equation (4) [79]: where and b are the coefficients to be adjusted. The empirical risk can be defined as Equation (5):

Grey Wolf Optimization
GWO is a metaheuristic algorithm that imitates the hunting behavior and social hierarchy of grey wolves [80]. Grey wolves live in groups with a social dominance hierarchy. They engage in various group activities, including hunting prey [81]. Grey wolves are classified according to social status as (from high to low) alpha, beta, delta, or omega. Alpha wolves are tasked with making decisions; beta wolves assist and advise them [82]. Delta wolves obey the leaders, as do omega wolves. In the GWO technique, the alpha, beta, and delta wolves guide the other wolves to the best area for hunting. GWO has several steps based on the hunting behavior of the grey wolf, namely searching for, tracking, chasing approaching and attacking prey. The location of other wolves during exploration (hunting) can be updated according to the location of the leader wolves ( where X represent the position vector of grey wolf and t shows the number of iterations. A and C are coefficient vectors given in Equation (10): where → a is linearly decreased from 2 to 0 during iteration. Meanwhile, r 1 and r 2 indicate random vectors in the range between 0 and 1.

Particle Swarm Optimization
PSO is an optimization technique that imitates the collective behavior exhibited by a flock of birds, a school of fish, or a swarm of insects. This method is similar to the GA, which uses population fitness data to find an optimal solution to a given problem [83]. PSO is advantageous for optimizing nonlinear problems, showing fast convergence and requiring few computations. These capabilities separate PSO from other evolutionary algorithms, such as the GA. In PSO, each bird in a flock is considered as a particle, which are searched for in n-dimensional space to find the optimal solution (where n is the number of problem parameters) [39]. Particles are scattered randomly within the search space. After every iteration, according to the equation 11 and equation 12 every particle adjusts its location by finding the most specific location that it has ever occupied and also the best one adjacent to its neighbor.
To simulate the behavior of a flock of birds, the location, and rate of change therein, of the i-th iteration are given by x t i = x t i1 , x t i2 , . . . , x t in and v t i = v t i1 , v t i2 , . . . , v t in , respectively. During model training, the location, and rate of change therein, of the i-th iteration are updated using the following Equations (11) and (12) [84]: where W is the inertia weight, C i and C 2 are the personal and social learning factors, respectively, r 1 and r 2 have random values from 0 to 1, and P t in and P t gn are the best locations for particle i and the swarm, respectively, at iteration n. The algorithm continues until the best location for each particle is equal to the best position for all particles. All particles are focused on one point in space, and the solution to the problem is thus optimized [38].

Relationships between Groundwater and Related Factors
FR values can provide information on the relationship between groundwater potential and related (topography, hydrology, geology, and land cover) factors. The FR is calculated for each class of each factor based on the T value. The FR values calculated in this study are shown in Table 1. According to the FR value of 5.34, there was abundant groundwater in areas with low elevation (0-100 m) and its show a strong spatial correlation with groundwater. Slope showed an inverse relationship with groundwater availability. Groundwater probability was highest for the (low) slope class of 0-3.96 (FR = 4.79), where it is more difficult for groundwater to accumulate on steep slopes due to the water flow and velocity conditions [55]. Slope height and LS-factor also showed an inversely proportional relationship with groundwater potential. The FR values were highest for the slope height class of 12.74-14.86 (FR = 3.04) and LS-factor class of 0-5.8 (FR = 4.79).
The presence of groundwater was more likely with a higher TWI. Water-retaining ability and water density are promoted by a high TWI (>13.93). The highest FR value for water density factor was seen in the >11.49 class (FR = 2.79). This spatial correlation of water density is an example of one exception given the geological and lowland characteristics of this area. The highest FR value for NDWI was in the 0.57 to −0.04 class (3.75). Taken together, the results show that the probability of groundwater is higher in areas with bodies of water.
The relationships between the presence of groundwater and geological factors were also analyzed. Regarding the distance from a fault factor, the highest FR value occurred in the class of >12,624 m (FR = 1.86). Lineament density showed an inversely proportional relationship with the presence of groundwater, and the highest FR value occurred in the class of 0.1-0.77 (FR = 2.43). An area with lower lineament density has better recharge potential and is more likely to hold groundwater [71]. And for FR calculation of lithology factor shown the high spatial correlation with Alluvium (FR = 4.802). Alluvium as the dominant aquifer structure in this area has a relation with groundwater presence.
Regarding soil types, the FR value was highest for immature paddy, at 13.1. Among the land use types, the FR value was highest for urban areas, at 5.98. The urban area has a relation with the groundwater condition which indicates by groundwater consumption and the rate of groundwater recharge by anthropogenic activity.

Construction of Groundwater Potential Maps
Groundwater potential mapping of the Gangneung-si area was performed by applying a machine learning algorithm, as discussed above. A combination of 13 groundwaterrelated factors served as the dependent variables, and can be mainly classified as topogra-phy, hydrology, geology, and land use factors. Optimization algorithms (GWO and PSO) were applied to the SVR machine learning method to produce the groundwater potential maps. In the maps, blue color indicates a very high probability of groundwater, and brown color indicates a very low probability. The mapping result of each algorithm can be seen in Figure 4.  In general, high probability class of groundwater; blue areas were similar among the three algorithms (SVR, SVR_GWO and SVR_PSO). These high probability class are located in the eastern part which is associated with areas composed of alluvium and low elevation. Based on the existing spatial thematic data, accumulated water on flowing to lower areas can increase the infiltration rate in the lowlands. Lowland conditions which are dominated by alluvium and sedimentary rocks that have pores allow the infiltration process that leads to groundwater recharging. However, the infiltration rate is highly dependent on the type of land cover, soil properties and saturation level in absorption. On the other hand, the distribution of moderate class of SVR_GWO and SVR_PSO was found in coastal area. This finding could be related to coastal aquifer conditions which could be influenced by other factors such as sea water intrusion. Besides, low probability (brown) areas are in the western part of the study area, which is characterized by rock and forest In general, high probability class of groundwater; blue areas were similar among the three algorithms (SVR, SVR_GWO and SVR_PSO). These high probability class are located in the eastern part which is associated with areas composed of alluvium and low elevation. Based on the existing spatial thematic data, accumulated water on flowing to lower areas can increase the infiltration rate in the lowlands. Lowland conditions which are dominated by alluvium and sedimentary rocks that have pores allow the infiltration process that leads to groundwater recharging. However, the infiltration rate is highly dependent on the type of land cover, soil properties and saturation level in absorption. On the other hand, the distribution of moderate class of SVR_GWO and SVR_PSO was found in coastal area. This finding could be related to coastal aquifer conditions which could be influenced by other factors such as sea water intrusion. Besides, low probability (brown) areas are in the western part of the study area, which is characterized by rock and forest cover. Also, the characterized of western part has a weak spatial relationship with groundwater presence than its affect to the mapping result.
A validation step was conducted to assess the reliability of the groundwater potential map from each algorithm. The accuracy of the groundwater potential maps generated using the three algorithms was then evaluated based on ROC curve analysis of the testing dataset (30% of all data). The AUC values were 0.803, 0.878, and 0.814 for SVR, SVR_GWO and SVR_PSO, respectively ( Figure 5). The results imply that the algorithms are useful for distinguishing high groundwater potential areas.
cover. Also, the characterized of western part has a weak spatial relationship with water presence than its affect to the mapping result.
A validation step was conducted to assess the reliability of the groundwater p map from each algorithm. The accuracy of the groundwater potential maps gener ing the three algorithms was then evaluated based on ROC curve analysis of th dataset (30% of all data). The AUC values were 0.803, 0.878, and 0.814 for SVR, SV and SVR_PSO, respectively ( Figure 5). The results imply that the algorithms are u distinguishing high groundwater potential areas.
Groundwater mapping accuracy was increased by applying the optimizati rithm to the SVR machine learning method. SVR_GWO and SVR_PSO increased m accuracy by 8.81% and 1.64%, respectively, and model performance was genera based on the AUC values of > 0.8 [46]. The increased accuracy was achieved by tu parameter of SVR algorithm based on optimization results calculation.

Sensitivity Analysis
Sensitivity analysis was conducted to assess the relative influence of each f groundwater potential, and to validate the above-described results. The 13 groun related factors were removed one by one from the dataset, for all three algorith resulting mapping accuracy results allowed us to determine each factor's influ shown in Table 2. The SVR model showed accuracy increases of 0.4% and 0.3% on of the precipitation and distance to fault factors, respectively. Mapping accu creased by 5.1%, 2.6% and 2.3% on inclusion of the land-use, soil type, and litho tors, respectively. Thus, these factors had a considerable influence on the grou probability mapping performance of the SVR model.  Groundwater mapping accuracy was increased by applying the optimization algorithm to the SVR machine learning method. SVR_GWO and SVR_PSO increased mapping accuracy by 8.81% and 1.64%, respectively, and model performance was generally good based on the AUC values of >0.8 [46]. The increased accuracy was achieved by tuning the parameter of SVR algorithm based on optimization results calculation.

Sensitivity Analysis
Sensitivity analysis was conducted to assess the relative influence of each factor on groundwater potential, and to validate the above-described results. The 13 groundwaterrelated factors were removed one by one from the dataset, for all three algorithms. The resulting mapping accuracy results allowed us to determine each factor's influence, as shown in Table 2. The SVR model showed accuracy increases of 0.4% and 0.3% on removal of the precipitation and distance to fault factors, respectively. Mapping accuracy decreased by 5.1%, 2.6% and 2.3% on inclusion of the land-use, soil type, and lithology factors, respectively. Thus, these factors had a considerable influence on the groundwater probability mapping performance of the SVR model. For the SVR_GWO model, when the soil type, NDWI, and elevation factors were removed, mapping accuracy decreased by 1.3%, 1.3%, and 1.2%, respectively. On the other hand, removal of the TWI, distance to fault, and water density factors had no effect on accuracy, while removal of the precipitation factor increased accuracy by 0.3%. Besides, for the SVR_PSO model, the accuracy decreased when the NDWI and lithology factors were removed, by 1.7% and 1.4%, respectively. These relationships can be explained that NDWI represents the index of wetness in certain area and lithology describe the characteristic of the geological structure which is important in groundwater presence. Overall, the results were acceptable and indicate that the algorithms can be usefully applied to this research area.
Apart from the influence of these factors, sensitivity analysis can evaluate the consistency of the map. This study shows that there are factors that have a dominant contribute on the results of mapping accuracy, namely, land use and lithology conditions of the area. Land use can affect the rate recharge of groundwater, which influences evapotranspiration, runoff and recharge system [54]. Lithology conditions can indicate the potential for storage and aquifer conditions in the area leading to water sources. Also, lithology can influence the infiltration and percolation of waterflow [69]. However, further hydrogeological analysis can be used as a comparison in the context of understanding the factors that influence groundwater recharge conditions.

Discussion
Groundwater is essential for human life and livelihoods. Groundwater mapping can be improved, and rendered more cost-effective, by identifying geophysical and hydrological factors associated with subsurface storage. This study developed a machine learning approach for analyzing RS and GIS data to map groundwater potential; combining statistical models and machine learning algorithms (FR and SVR) can facilitate scientific decision-making as it pertains to a variety of problems [85]. FR values are an efficient way to simplify equations and aid interpretation of results [86]. Integration of RS and GIS data can improve time and cost efficiency in the context of groundwater probability mapping [17,57]. Machine learning algorithms can be used to model natural phenomena involving factors with nonlinear relationships, and so were applied to map groundwater potential in this study. These models can handle and analyze the complex nonlinear relationships between groundwater potential and various factors.
The groundwater potential mapping results that we obtained using the SVR machine learning algorithm showed that the study area could be classified based on groundwater probability. Areas with high groundwater potential were associated with alluvium, which was distributed around rivers and tributaries that flow into the sea. Alluvium is a type of aquifer with sufficient porosity and permeability to allow water to accumulate [58,87]. In addition, land use also influences groundwater potential; the high probability areas were detected near agricultural and urban [51], which are also the land use types that use the most groundwater. Groundwater resource availability affected by human activities such as urbanization and changing land cover, it will affect changing recharge rates. Besides, precipitation can affect the potential for groundwater, which is related to groundwater recharging conditions. Furthermore, high precipitation does not always increase groundwater recharge. The higher the precipitation can increase surface water overflow and hinder the process of groundwater recharge [70]. In addition, the infiltration rate which is indirectly influenced by soil conditions and the saturation level of water absorption are other factors that are considered. Meanwhile, for the forest land use areas, it is possible that many wells may contain groundwater, considering that 70% of the study area is forest area. The influence of the groundwater related factor was examined by the sensitivity analysis, in which mapping accuracy decreased by 5.1%, 2.3%, and 2.6% when land use, lithology, and soil type, respectively, were not included in the SVR process. Thus, three related factors commonly used in groundwater study as important role in term of infiltration rate which lead to groundwater presence. On the other hand, the potential for high groundwater potential found on the coastal area requires further observation regarding the effect of sea water intrusion on the presence of groundwater. As we know, sea water can fill pores in the soil structure in coastal areas and this has the potential to occur in Gangneung and affect groundwater conditions [88]. Several studies also argue for the potential for sea water intrusion in coastal areas [89,90]. For that, it is necessary to field investigation regarding groundwater conditions, especially around the coast [70]. As well as evaluation related to other spatial factors that can affect groundwater for further research. However, this is a preliminary study that can provide an overview of the potential for groundwater in Gangneung.
ROC curve analysis was performed to assess the performance of the three algorithms. The AUC values of the SVR, SVR_GWO and SVR_PSO models were 0.803, 0.878 and 0.814, respectively. Thus, SVR_GWO was reach the highest AUC value than another machine algorithm in this research. The accuracy values of all algorithms were good, so these models should be useful for this area of research. Application of the GWO and PSO optimization algorithms increased the SVR model accuracy by 8.6% and 1.6%, respectively. The optimization process tuned the gamma, epsilon, and C parameters in the SVR algorithm [36]. The optimization aimed to minimize the root mean square error (RMSE), which indicates the discrepancy between observation and predictions; lower RMSE values indicate higher model quality [80]. The SVR algorithm was used for groundwater potential mapping because it has several advantages: a good solution is not required at the start of the iterative process, it can be used in conjunction with other optimization methods, and multi-input, nonlinear optimization problems can be solved efficiently [40]. The increased accuracy of SVR achieved by applying the optimization algorithms is consistent with the results of Al-Fugara et al. (2019), who reported that spatial mapping of groundwater was improved by using "SVR-RBF-GA" and "SVR-RBF-GS" methods [34]. Furthermore, , reported better landslide mapping performance using a hybrid SVR method compared with the adaptive neuro-fuzzy inference system (ANFIS) [40]. However, there are also some disadvantages of SVR; for example, the absence of data mining features means that data are classified immediately, although labeled data can be extracted, which increases classification accuracy [35].
This study focuses on the application of machine learning methods and integration with remote sensing for groundwater potential mapping. However, there are still limitations in this study, especially related to the explanation of the physical water balance and groundwater refill mechanism. Furthermore, hydrological and phreatic surface analysis is still needed to carry out as a crosscheck-validation of the groundwater potential map [91]. This is necessary to obtain an overview of the groundwater recharge and can provide further information to improve understanding of physical water balance. In terms of methodology, the use of an optimization algorithm promises alternative approach in the future in groundwater mapping. In particular, there are still potential advantages in a combination of expert experience and deep learning applications that have data mining features [14,92]. Therefore, this result is a preliminary study initiated in groundwater mapping by combining a hybrid algorithm with groundwater capacity data. Further studies can be initiated by constructing a phreatic surface of aquifer to get a more detailed picture of the relationship between factors and groundwater potential.

Summary and Conclusions
A groundwater potential map was constructed for the Gangneung-si area using the machine learning method, based on 13 groundwater-related factors (mainly categorized as topography, hydrology, geology, and land use factors). FR values were used to assess the correlations of the factors with groundwater occurrence. Groundwater data from 285 wells in Gangneung-si were analyzed; these data were divided randomly into training (70%) and testing (30%) datasets. The GWO and PSO optimization algorithms were applied to the SVR. ROC curve analysis was used to assess the accuracy of each model. A sensitivity test was also performed for linkage the related factors with groundwater probability. The land use and lithology factors were the major factors used to construct the groundwater potential maps in this research.
In summary, three new models were developed for reasonable groundwater potential mapping: SVR, SVR_GWO, and SVR_PSO. The AUC values were 0.803, 0.878, and 0.814, respectively. Thus, the SVR_GWO model was more efficient for evaluating groundwater potential than the SVR and SVR_PSO models. Furthermore, the variance in accuracy of each factor could be small because the selected variable is important in the analysis. Nevertheless, this approach using a hybrid algorithm could be useful for groundwater mapping and exploration development. Due to some uncertainties associated with the methods, sample size, and raster spatial resolution, further studies aimed at developing accurate and reliable regional-scale groundwater potential maps are needed. Other related factor analysis methods and further hydrology analysis could be used to improve groundwater potential mapping results.

Data Availability Statement:
The dataset related to groundwater-related factor analysis can be found on: https://www.bigdata-environment.kr/user/main.do.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study, in the collection, analysis, or interpretation of the data, in the writing of the manuscript, or in the decision to publish the results.