Next Article in Journal
Fostering Sustainable Urban-Rural Linkages through Local Food Supply: A Transnational Analysis of Collaborative Food Alliances
Previous Article in Journal
Human-Scale Sustainability Assessment of Urban Intersections Based upon Multi-Source Big Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS

1
BK21 Plus Project of the Graduate School of Earth Environmental Hazard System, Pukyong National University, Busan 48513, Korea
2
Department of Geological Sciences, Pusan National University, Busan 46241, Korea
3
Department of Spatial Information Engineering, Pukyong National University, Busan 48513, Korea
*
Author to whom correspondence should be addressed.
Sustainability 2017, 9(7), 1157; https://doi.org/10.3390/su9071157
Submission received: 7 June 2017 / Revised: 28 June 2017 / Accepted: 29 June 2017 / Published: 2 July 2017
(This article belongs to the Section Sustainable Engineering and Science)

Abstract

:
This study mapped and analyzed groundwater potential using two different models, logistic regression (LR) and multivariate adaptive regression splines (MARS), and compared the results. A spatial database was constructed for groundwater well data and groundwater influence factors. Groundwater well data with a high potential yield of ≥70 m3/d were extracted, and 859 locations (70%) were used for model training, whereas the other 365 locations (30%) were used for model validation. We analyzed 16 groundwater influence factors including altitude, slope degree, slope aspect, plan curvature, profile curvature, topographic wetness index, stream power index, sediment transport index, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover. Groundwater potential maps (GPMs) were constructed using LR and MARS models and tested using a receiver operating characteristics curve. Based on this analysis, the area under the curve (AUC) for the success rate curve of GPMs created using the MARS and LR models was 0.867 and 0.838, and the AUC for the prediction rate curve was 0.836 and 0.801, respectively. This implies that the MARS model is useful and effective for groundwater potential analysis in the study area.

Graphical Abstract

1. Introduction

Groundwater is defined as water in the saturated zone that fills the pore spaces between mineral grains and the cracks and fractures within a rock mass [1]. It results from the interactions of climatic, geological, hydrological, physiographical, and ecological factors [2]. Globally, groundwater makes up 50% of the present potable water supplies, 40% of the industrial water demand, and 20% of the water used for irrigation [3]. Therefore, it is not only an essential element of life, but also an essential natural resource. Due to the rapid population increase and economic development, the demand for groundwater resources for agricultural, industrial, and potable uses has been increasing [4]. Because groundwater is a limited resource, it is necessary to devise effective and efficient plans to use it based on an understanding of the behavior of groundwater systems and identification of the current status of the local groundwater system through groundwater exploration [5].
Traditional methods of exploring groundwater, which is a hidden natural resource, include drilling, geophysical, geological, and hydro-geological methods. However, such methods entail large expenses and the use of time and human resources for field surveys [6,7]. Groundwater potential maps (GPMs), based on geographic information system (GIS) and remote sensing (RS) data, have been widely used to solve this problem. GIS offers suitable alternatives for the effective management of large and complex geospatial databases [8]. In addition, it can be useful for groundwater exploitation and groundwater resource conversion, as it provides insights into the future availability of groundwater resources [9,10].
GPM has been applied with various methods, including the frequency ratio (FR) [4,11,12,13], logistic regression (LR) [14,15], weight of evidence [15,16], multi-criterion decision analysis [17,18], evidential belief function [19,20,21], index of entropy [12,22], and certainty factor [18]. With the recent rapid development of information technology and database technology, data mining algorithms are now being applied to diverse areas beyond information technology [23]. The fields of geology and hydrogeology have also widely used artificial neural networks, random forests, support vector machines, and decision trees for mapping landslide susceptibility [24,25,26,27,28], gullies [29], mineral potential [30,31,32], groundwater potential [16,20,33,34,35], and groundwater levels [36,37,38]. In recent years, techniques such as K-nearest neighbor (KNN) [39], linear discriminant analysis (LDA) [40], multivariate adaptive regression splines (MARS) [41,42], and quadratic discriminant analysis (QDA) [9,43] have also been used.
As shown in previous studies, diverse data mining techniques can be employed, but few such studies have been performed. Very few studies have attempted to create GPMs using the MARS model. The purpose of this study was to use the MARS model, a data mining technique, with the widely used LR model to create GPMs. The model performance of these GPMs was comparatively analyzed using receiver operating characteristic (ROC) curves. The ultimate aim was to evaluate the efficacy of the MARS model for creating GPMs.

2. Study Area

This study was conducted at a site in Buyeo-gun, Chungcheongnam-do, Korea, with a surface area of 625 km2, located between 127°03′ and 126°44′ east longitude and 36°04′ and 36°23′ north latitude (Figure 1). The total population of Buyeo-gun was 71,143 in 2015, of which 31.2% or 22,213 individuals were engaged in farming [44]. The elevation of the study area is 0–640 m, and 72.8% of the overall area is formed as lowland with an elevation of 100 m or less. The study area is a basin with a high temperature and large daily temperature range in the summer, as well as large amounts of dew and fog due to the influence of the Geumgang River. Based on 2015, the annual precipitation is 848.8 mm and more than half of the annual precipitation occurs in the summer. The annual mean temperature is 12.9 °C, with a maximum temperature of 35.8 °C in the summer and a minimum winter temperature of −14.2 °C [44]. In terms of ground cover, most of the study area is composed of agricultural (40.1%) and forest (47.0%) areas. The lithology indicates that 52.05% of the study area is covered with metamorphic rock. This study area contains one river and 51 streams. As of 2015, the water and sewer distribution rates in the study area were 73.6% and 50.6%, respectively. In 2015, Buyeo-gun used 35,899,226 m3 of groundwater annually, which is about 8% of the total groundwater use of Chungcheongnam-do (475,376,469 m3/year). Most of this water is used for farming (about 71%; [45]). Considering that other cities and districts in Chungcheongnam-do only use about 7% of groundwater annually, Buyeo-gun has a relatively high dependence on groundwater.

3. Materials and Methods

In this study, a GPM was created through the three major steps described below. The first step was spatial database construction, in which a spatial database was created containing groundwater well locations and groundwater influence factors. The second step was groundwater potential assessment. LR and MARS models were used to analyze the relationships between well location and groundwater influence factors, and a GPM from each model was created for the overall study area. The third step was the validation process. The performance of the GPM created by each model was evaluated using ROC curves. A flow chart of the methodology used in this study is presented in Figure 2.

3.1. Data Preparation

3.1.1. Well Data

The groundwater well data used in this study were collected from extensive field surveys and governmental reports. Well water was used for a variety of purposes including livestock, farming, and human drinking water. The groundwater yield was calculated from the results of a pumping test of a groundwater well. The groundwater potential was based on the prediction of the best potential for groundwater extraction in the study area [9]. Based on previous studies and groundwater productivity reports, an actual pumping test was conducted using the groundwater well data for the study area. The high productivity value was based on a yield value ≥70 m3/d. The groundwater productivity data from 1224 wells were selected and randomly divided into a training dataset containing 70% or 859 wells and a validation dataset with 30% or 365 wells. In addition, it was necessary to obtain sampling data from areas without groundwater wells. The data for the same number of groundwater wells (859) were selected as non-well occurrence data and allocated a value of 0 for application to the LR and MARS models. Figure 1 shows the locations of the groundwater well data used in this study.

3.1.2. Groundwater Influencing Factors

As presented in Table 1 and Figure 3, 16 groundwater influence factors were used in this study. These factors were largely divided into topographical, hydrological, geological, and land cover factors. Groundwater influence factors were created using ArcGIS 10.2 software (ESRI, Redlands, CA, USA), and were converted into a raster file with a spatial resolution of 10 × 10 m prior to use for groundwater potential assessment.
Topographic factors included altitude, slope degree, slope aspect, plan curvature, and profile curvature. Areas with different elevations have notable differences in weather and climatic conditions, which lead to differences in soil conditions and vegetation [46]. The slope degree is a factor that is mainly used to determine groundwater recharge processes, as gentle slope areas have a low surface runoff and high rates of percolation, while the opposite is true for high slope areas [19]. The slope aspect is a factor related to precipitation direction and physiographic trends, and it affects the soil water content [12]. Curvature represents the morphology of the topography, and is composed of three aspects: profile, plan, and total, the latter of which combines profile and plan. Profile curvature and plan curvature mainly affect the acceleration and deceleration of flow, as well as flow convergence and divergence, on the ground surface [22]. These factors were extracted from the digital elevation model (DEM) using a spatial analyst tool of ArcGIS 10.2 software. The DEM was created using contour lines and points extracted from a 1:5000 digital map provided by the Korean National Geographic Information Institute. The ArcGIS triangular irregular network (TIN) module was used for this process, and the generated TIN was converted into a raster file with a pixel size of 10 m.
Hydrologic factors such as the topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), distance from drainage, and drainage density were considered when estimating the flow of surface water and groundwater according to topographical factors. TWI is a secondary topographic index that has been used to describe spatial moisture patterns and explain the effects of topographic conditions on these patterns [47]. It plays an important role in influencing the movement and accumulation of runoff on the ground surface [11]. SPI is a factor that estimates the degree of slope erosion due to flowing water. TWI and SPI are calculated using the following equations [47]:
TWI = ln ( A s t a n β )
SPI = A s t a n β
where A s is the cumulative upslope area and β is the slope gradient. STI combines slope steepness and slope length, and is used to measure the sediment transport capacity of overland flow within the universal soil loss equation [48]. STI is calculated using the following equation [49]:
STI = ( A s 22.13 ) 0.6 ( s i n β 0.0896 ) 0.6
where A s is the cumulative upslope area and β is the slope gradient. Drainage lines were used to create a drainage density map and distance map of the study area. Drainage density has an inverse relationship with permeability. A high drainage density decreases infiltration and increases surface runoff, and is therefore not appropriate for groundwater development [50]. The drainage density is calculated by dividing the surface area (km2) by the sum of drainage lengths (km) for the corresponding cell. The drainage density and distance from drainage were determined using the line density tool and Euclidean distance tool in ArcGIS 10.2 software, respectively.
Geological factors affect the porosity and permeability of aquifer materials, and are thus considered indicators of hydrological features. Geological factors are composed of lithology, distance from fault, fault density, distance from lineament, and lineament density. These factors were determined using a digital geological map at a 1:50,000 scale, obtained from the Korea Institute of Geoscience and Mineral Resources.
The study area was divided into 37 lithology units according to the type of lithology and geological age. In this study, these lithology units were classified according to their characteristics into metamorphic rock, sedimentary rock A, sedimentary rock B, igneous rock, and dike and talus. Here, sedimentary rocks were classified based on permeability. Sedimentary rock A is permeable rock made of sandstone or gravel, whereas sedimentary rock B is impermeable rock made of shale or clay. Faults extracted from the digital geology map were used to determine the distance from the fault and fault density. A lineament is defined as a straight or slightly curved surface feature of natural origin directly observed from the image [51,52]. Because lineaments are related to discontinuities such as joints, faults, and folds, they have been used for structural analysis, lithological relationship analysis, and groundwater productivity assessment [13]. In this study, lineaments were extracted from a hill-shaded map created from the DEM using the geological and geophysical analysis tool in Geomatica 2016 software (PCI Geomatics, Markham, ON, Canada). The hill-shaded map was created by combining images in three directions where the sun altitude was 45° and the sun azimuth was 45°, 90°, or 135°. The extracted lineament lines were used to determine the distance from the lineament and lineament density using the Euclidean distance tool and line density tool in ArcGIS 10.2 software.
Land cover represents the biological state of a geographic feature on the surface of the earth, and has been used to demarcate groundwater availability [22]. The type of land cover contributes to variation in the soil condition, and discontinuities resulting from this variation affect the occurrence, storage, and movement of groundwater. In this study, land cover factors were constructed using a digital land cover map provided by the Ministry of Environment. The digital land cover map was prepared on a 1:25,000 scale using Korea Multi-Purpose SATellite-2, and this map included 22 land cover categories that were reclassified into seven groups including urban, farmland, forest, grassland, wetland, bare land, and water.

3.2. Groundwater Potential Mapping

In this study, groundwater potential assessment was analyzed using the LR and MARS models. Analysis of the LR and MARS models was performed using the “glm” and “earth” packages in R 3.3.0 software (R Foundation for Statistical Computing, Vienna, Austria), respectively.

3.2.1. Logistic Regression

The LR model is a type of multivariate regression used to explain the relationships among a dichotomous dependent variable coded into 0 and 1, and one or more categorical or numerical independent variables [53]. In this study, the dependent variable was a binary variable indicating the presence or absence of groundwater wells, with a value of 1 or 0, respectively. Independent variables were the groundwater influencing factors that affect the groundwater wells. In general, the LR model can be expressed as follows [14,21]:
P = e z ( 1 + e z )
where P is the probability of an occurrence and Z is the linear combination function of the independent variables showing a linear relationship. Z can be expressed as follows:
Z = α + β 1 x 1 + β 2 x 2 + + β n x n
where α is the intercept, n is the number of independent variables, β n represents the regression coefficients that represent the contribution of each independent variable to the probability value (P), and x n represents the independent variables. In Equation (5), the value of Z ranges from −∞ to +∞. Positive regression coefficients indicate that the dependent variable has a positive correlation with the independent variables, whereas negative regression coefficients indicate that the independent variables have a negative effect on the dependent variable. However, if the dependent variable is a binary variable divided into presence and absence, as in this study, the value of the dependent variable is coded as 1 and 0. Thus, the predicted value of the dependent variable is a probability estimate. The probability value has an upper limit of 1 and a lower limit of 0, and the relationship between the dependent variable and this probability cannot be expressed as a linear function. Therefore, the upper and lower limits of the probability are removed by converting the probability into a logit function. The relationship between the dependent variable and logit can be expressed as a linear function. Probability can be converted into a logit function using the following equation:
L o g i t ( P ) = l n P 1 P = α + β 1 x 1 + β 2 x 2 + + β n x n
where P 1 P is the odds or likelihood ratio representing the ratio between the probability P that the dependent variable is present and the probability 1 – P that the dependent variable is absent. The natural logarithm of odds is called logit(P) and is a linear function of the independent variables ranging from −∞ to +∞. To be more precise, if the value of the probability P increases, the value of logit(P) also increases [14].

3.2.2. Multivariate Adaptive Regression Splines Model

The MARS model is a statistical method introduced by Friedman (1991) that is used to fit the relationship between the dependent and independent variables. It is a nonlinear and nonparametric regression method that combines classic linear regression, the mathematical construction of splines, binary recursive partitioning, and brute and intelligent algorithms [54,55]. A benefit of this method is that specific assumptions of the underlying functional relationship between the independent and dependent variables are unnecessary [56,57]. The MARS model predicts a function using linear combinations and interactions of the adaptive piecewise linear regression known as the “basis function (BF)”. Accordingly, f(X) of the MARS model can be expressed by the following equation [42,57]:
f ( X ) = β 0 + i = 1 n β i λ i ( X )
where β 0 is a constant, β i is the coefficient of the ith BF, λ i ( X ) is a BF, and n is the number of BFs in the model. All of the coefficients were estimated using the least-squares method. The BFs are functions that take the following form [58]:
max ( 0 , x α )   o r max ( 0 , α x )
where x is an independent variable and α is a constant corresponding to a knot (or hinge). Two adjacent splines intersect at the knot to maintain the presence of the BF [33]. The MARS model was developed in two steps: the forward stepwise algorithm and the backward stepwise algorithm. The first step, the forward stepwise algorithm, adds BFs to Equation (1) and finds potential knots to obtain a better model performance. However, obtaining too many BFs in this process can result in overfitting the MARS model. The second step, the backward stepwise algorithm, is used to lessen this problem. In this step, redundant BFs that have the smallest contributions to the model are removed from the BFs used in the forward stepwise algorithm to find the best sub-model. Generalized cross-validation (GCV) is used to remove redundant BFs from the MARS model, and is calculated as follows [23,37]:
GCV = 1 N i = 1 N [ y i f ( X i ) ] 2 [ 1 M + d × M 1 2 N ] 2
where N is the amount of data, f ( X i ) is the predicted value of the MARS model, M is the number of BFs, and d is the penalizing parameter. If the value of d is large, it can result in a small number of knots being used. The optimum value of d is considered to be in the range of 2 ≤ d ≤ 4 [56]. In this study, a default value of 3 was used for d.

4. Results

4.1. Preliminary Analysis

In this study, multicollinearity and FRs were analyzed as preliminary analyses prior to groundwater potential assessment.

4.1.1. Multicollinearity Analysis

Multicollinearity refers to a linear relationship that exists among two or more variables. If multicollinearity exists among the independent variables during regression analyses, the variance of the regression coefficient increases. Error also increases with multicollinearity, reducing the accuracy of the model’s prediction. Multicollinearity can be assessed by various means, and tolerance (TOL) and variance inflation factor (VIF) assessments were used in this study. TOL and VIF indicate multicollinearity among independent variables when their values are ≤0.1 and ≥10, respectively [59]. The results of the multicollinearity analysis on the 16 independent variables used in this study are presented in Table 2. This analysis showed that the TOL and VIF of all variables used in this study were ≥0.1 and ≤10, respectively. This suggests that there was no problem of multicollinearity among the independent variables used in this study, so LR analyses were performed with all of the variables.

4.1.2. Spatial Relationship between Groundwater Wells and Influence Factors

The FR was used to analyze the probabilistic relationship between groundwater wells and groundwater influence factors in this study. FR is defined as the ratio between areas in which groundwater wells occur to the total study area, and is calculated using the following equation [18]:
FR = a / b c / d
where a is the number of pixels with groundwater wells for each groundwater influence factor, b is the total number of groundwater wells in the study area, c is the number of pixels in the factor’s class, and d is the total number of pixels in the study area. FR is considered to show an average relationship if its value is 1, high correlation if larger than 1, and low correlation if lower than 1 [60]. Among groundwater influence factors, FR values for continuous factors (e.g., altitude) were calculated after dividing the values into nine interclasses by quantile classification.
The results of the FR analysis are presented in Table 3. The value of FR for altitude was larger than 1 in classes 7–59. The value of FR was larger than 1 when the slope was 10.76 degrees or below, and was highest at 2.92, in the 2.96–6.72 class. Regarding the slope, the values of FR were relatively high for flat, northeast-facing, southeast-facing, and southwest-facing slopes. The value of FR for plan curvature and profile curvature was highest in the flat class and lowest in the convex class. Regarding the TWI, the value of FR was high at 2.42 and 2.09 in the −5.20–0.57 and 3.45–5.01 classes, respectively, and low at 0.19 in the −7.84–−5.2 class. SPI had an FR value larger than 1 when the value of SPI was lower than −1.60, except in the −8.60–−6.26 class. The FR value for STI was high at 1.16 and 1.14 in the 0 and 1–14.80 classes, respectively. Regarding the distance from drainage, the FR value was highest at 1.17 in the 917.47–1190.76 and 1483.56–1834.94 classes and lowest at 0.83 in the 0–195.206 class. For drainage density, the highest FR value was found in the 0.18–0.73 class. For lithology, FR was larger than 1 in the igneous rock and dike and talus classes, indicating a high probability of well occurrence. In the case of distance from a fault, the value of FR was highest at 2.07 in the 2997.87–3997.16 class and lowest at 0.41 in 0.00–777.23 class. The value of FR for fault density was highest in the 0 class. For distance from a lineament, the highest FR value of 1.29 was found in the 443.97–624.85 class and the lowest FR value in the 0.00–131.55 class (0.58). For lineament density, the highest FR value was found in the 0.46–0.60 class (1.31) and lowest in the >1.09 class (0.58). For land cover, the values of FR were high at 2.48, 1.95, and 1.79 in the urban, farmland, and bare land classes, respectively, indicating a high probability of well occurrence.
Based on the results of FR analysis, the classes of groundwater influence factors used in this study had different FR values, and some factors showed a broad range of values. For example, the slope degree had an FR range of 0.02–2.92. In addition, each groundwater influence factor had at least one class with an FR value larger than 1, showing a high correlation with well occurrence. Therefore, the 16 factors used in this study are appropriate for use as groundwater influence factors.

4.2. Groundwater Potential Assessment and Mapping

4.2.1. Application of the Logistic Regression Model

The results of analysis using the LR model are presented in Table 4. Among the independent variables used in this study, factors such as the slope degree, profile curvature 2 (flat), TWI, SPI, distance from drainage, lithology 3 (sedimentary rock B), lithology 4 (igneous rock), fault density, distance from lineament, land cover 3 (forest), land cover 5 (wetland), and land cover 7 (water) had significant effects on groundwater well occurrence at the 5% significance level. The results of the β coefficient, altitude, SPI, distance from drainage, and lineament density also had positive effects on groundwater well occurrence. However, the slope degree, TWI, STI, drainage density, distance from fault, and distance from lineament had negative β coefficient values, indicating negative effects on groundwater well occurrence. For categorical variables such as the slope aspect, plan curvature, profile curvature, lithology, and land cover, the plan curvature 3 (concave), profile curvature 2 (flat), lithology 2 (sedimentary rock A), lithology 4 (igneous rock), lithology 5 (dike and talus), and land cover 6 (bare land) classes had positive effects on groundwater well occurrence. On the other hand, some classes of slope aspect, plan curvature, profile curvature, lithology, and land cover had negative effects on groundwater well occurrence. In addition, the results of the LR model showed a null deviance of 2381.7 with 1717 degrees of freedom. The residual deviance was 1722.9 with 1684 degrees of freedom and an Akaike Information Criterion of 1790.9.

4.2.2. Application of Multivariate Adaptive Regression Splines Model

The optimal MARS model included 25 terms, and the GCV was 0.165. The MARS model generates the optimal model by only selecting the necessary independent variables [55]. Of the 16 independent variables included in this study, only 10 variables (altitude, slope degree, distance from drainage, drainage density, lithology, distance from fault, fault density, distance from lineament, lineament density, and land cover) were used to construct the optimal model. Categorical variables, such as lithology and land cover, only included classes, for example, sedimentary rock A, sedimentary rock B, igneous rock, forest, wetland, and water. Based on an analysis of the MARS model, a BF was created for each independent variable and each BF had a different β coefficient. Continuous variables have one or more constants corresponding to a knot within the variable, which lead to different effects on groundwater well occurrence.
In the MARS model, it is possible to estimate the relative importance of variables. The results of the selections and contributions of various independent variables are shown in Table 5. Here, nsubset is a criterion of the number of model subsets that include each variable. Variables that are included in more subsets are considered more important. GCV provides a generalized cross-validation of the model. The GCV criterion first calculates the decrease in the GCV for each subset relative to the previous subset. Then, for each variable, it sums these decreases over all subsets that include that variable. Finally, for ease of interpretation, the summed decreases are scaled so the largest summed decrease is 100. In addition, RSS is the residual sum-of-squares of the mode. In the case of RSS and GCV, variables that cause larger net decreases are considered more important [58,61].
Based on Table 5, land cover 3 (forest) was the most important variable explaining the spatial distribution of groundwater wells in the study area, followed by altitude, slope degree, land cover 7 (water), and distance from the fault. These independent variables had lower values (<0.15) for the frequency ratio compared with other variables. In addition, altitude had no significant effect on groundwater well occurrence at the 5% significance level in the LR model. The influence of the independent variables differed depending on the result of FR, LR, and the MARS model.

4.2.3. Groundwater Potential Mapping

The following equations were used to apply the analysis results of the LR and MARS models to the creation of a GPM for the overall study area:
GPM LR = 5.016 + ( 0.002 × Altitude ) ( 0.213 × Slope   degree ) ( 1.412 × Slope   aspect 2   [ N ] ) ( 0.981 × Slope   aspect 3   [ N E ] ) ( 1.250 × Slope   aspect 4   [ E ] ) ( 1.286 × Slope   aspect 5   [ S E ] ) ( 1.126 × Slope   aspect 6   [ S ] ) ( 1.365 × Slope   aspect 7   [ S W ] ) ( 1.384 × Slope   aspect 8   [ W ] ) ( 1.498 × Slope   aspect 9   [ N W ] ) ( 0.036 × Plan   curvature 2   [ Flat ] ) + ( 0.278 × Plan   curvature 3   [ Concave ] ) + ( 0.629 × Profile   curvature 2   [ Flat ] ) ( 0.051 × Profile   curvature 3   [ Concave ] ) ( 0.367 × TWI ) + ( 0.251 × SPI ) ( 5.79 E 04 × STI ) + ( 2.28 E 04 × Distance   from   drainage ) ( 0.020 × Drainage   density ) + ( 0.172 × Lithology 2   [ Sedimentary   rock   A ] ) ( 2.873 × Lithology 3   [ Sedimentary   rock   B ] ) + ( 0.581 × Lithology 4   [ Igneous   rock ] ) + ( 0.187 × Lithology 5   [ Dike   and   talus ] ) ( 1.01 E 05 × Distance   from   lineament ) ( 0.717 × Fault   density ) ( 2.28 E 04 × Distance   from   lineament ) + ( 0.227 × Lineament   density ) ( 0.332 × Land   cover   2   [ Farmland ] ) ( 1.992 × Land   cover   3   [ Forest ] ) ( 0.962 × Land   cover   4   [ Grassland ] ) ( 3.055 × Land   cover   5   [ Wetland ] ) + ( 0.980 × Land   cover   6   [ Bareland ] ) ( 2.717 × Land   cover   7   [ Water ] )
GPM MARS = 3.241 ( 0.041 × max ( 0 ,   Altitude 13 ) ) ( 0.046 × max ( 0 ,   28 Altitude ) + ( 0.303 × max ( Slope   degree 5.15 ) ) ( 0.370 × max ( Slope   degree 5.44 ) ) + ( 0.061 × max ( Slope   degree 12.18 ) ) ( 0.001 × max ( Distance   from   drainage 150.33 ) ) ( 0.001 × max ( 2020 Distance   from   drainage ) ) + ( 0.001 × max ( Distance   from   drainage 2020 ) ) ( 0.115 × max ( Distance   density 0.89 ) ) + ( 0.180 × max ( Distance   density 1.85 ) ) ( 0.070 × max ( Distance   density 3.72 ) ) + ( 0.090 × Lithology 2   [ Sedimentary   rock   A ] ) ( 0.232 × Lithology 3   [ Sedimentary   rock   B ] ) + ( 0.078 × Lithology 4   [ Igneous   rock ] ) + ( 2.32 E 04 × max ( Distance   from   fault 2 , 801.79 ) ( 4.82 E 04 × max ( Distance   from   fault 3 , 548.03 ) + ( 2.47 E 04 × max ( Distance   from   fault 4 , 105.85 ) + ( 0.298 × max ( 0.33 Fault   density ) ( 0.01 × max ( 226.72 Distance   from   lineament ) ( 0.143 × max ( 0.677 Lineament   density ) ( 0.280 × Land   cover   [ Forest ] ) ( 0.292 × Land   cover   [ Wetland ] ) ( 0.342 × Land   cover   [ Water ] )
When the GPM was classified by the groundwater potential zone using four classification techniques including a natural break, quantile, equal interval, and geometrical interval, and when the distribution of training and validation groundwater wells in high and very high zones was comparatively analyzed, the quantile classification technique was most accurate [15]. Based on this finding, the GPMs created using the LR and MARS models were classified into very low, low, moderate, high, and very high groundwater potential zones using the quantile classification technique. Figure 4 shows the GPMs created by the LR and MARS models.
The surface area of the GPM created using the LR model in high and very high zones was 248 km2, which is 39.7% of the overall surface area. In addition, the surface area of the GPM created with the MARS model in high and very high zones was 251 km2 (40.1%), which is slightly larger than that of the GPM created using the LR model. Comparing the surface areas of the GPM in each zone, the difference in the surface area of the very high zone (0.62 km2) was not large. However, the surface area of the GPM created with the MARS model in the high zone was 1.78 km2 larger than that in the GPM created using the LR model, a relatively large difference in surface area.

4.3. Validation and Comparison

ROC curves were used to evaluate the performance of the GPMs created in this study. An ROC curve is a scientific technique used to describe the efficiency of probabilistic and deterministic detection and prediction systems [62], and is formed by plotting the trade-off between the true positive rate (sensitivity) on the X-axis and the false positive rate (1-specificity) on the Y-axis. ROC curves can be divided into success rate curves and prediction rate curves according to the dataset used. A success rate curve is formed using a training dataset, and represents how well the model fits the groundwater wells observed. A prediction rate curve is formed using a validation dataset, and represents how well the model predicts groundwater wells [63]. The ROC curve can be used as a quantitative measure through the calculation of the area under the curve (AUC). The value of AUC is between 0.5 and 1.0, and a value closer to 1 indicates a model with a better predictive capability. AUC values can be evaluated as follows. Poor: 0.5–0.6, Average: 0.6–0.7, Good: 0.7–0.8, Very good: 0.8–0.9, and Excellent: 0.9–1.0 [64]. The ROC curves and AUC of GPMs created using the LR and MARS models are shown in Figure 5.
In our success rate curve, the AUC value of the GPMs created using the LR and MARS models were 0.838 and 0.867, respectively. Thus, the AUC value was 0.029 higher for the GPM created using the MARS model compared to the GPM created with the LR model. The AUCs of the prediction rate curves were similar, with the AUC (0.836) of the GPM created with the MARS model being slightly higher than the AUC (0.801) of the GPM created with the LR model. These results, showing that the GPMs created in this study have AUC values of 0.8 or above, indicate an excellent predictive capability for both models, with the MARS model performing better than the LR model.

5. Discussion and Conclusions

Groundwater is an important natural resource, and the spatial distribution of groundwater can be detected and predicted by creating GPMs using various factors to ensure its continued availability. In this study, the LR and MARS models were used to evaluate groundwater potential and create GPMs. Groundwater well data (with high potential yields of ≥70 m3/d) were classified into a training dataset (70%, 859 groundwater well locations) and validation dataset (30%, 365 groundwater well locations). This study used 16 groundwater influence factors for groundwater potential assessment, including topographic factors (altitude, slope degree, slope aspect, plan curvature, profile curvature), hydrologic factors (TWI, SPI, STI, distance from drainage, drainage density), geological factors (lithology, distance from fault, fault density, distance from lineament, lineament density), and land cover. Groundwater well locations and groundwater influence factors were applied to the LR and MARS models to analyze groundwater potential, and GPMs were created based on the results of this analysis. The accuracy of the models was tested using an ROC curve. The GPMs created in this study all exhibited AUC values of 0.8 or above, indicating an excellent model performance. The AUCs for the success rate and prediction rate curves of the GPM created with the MARS model were, respectively, 0.029 and 0.035 higher than those of the GPM created using the LR model.
It was also possible to estimate the groundwater potential in the MARS model. The results showed that land cover, altitude, slope degree, and distance from a fault made large contributions to groundwater occurrence. According to another study, altitude, TWI, distance from rivers, land cover, fault density, slope, and lithology were important factors [9,20,39]. These results were similar to our results. However, in our study, lithology, distance from rivers, and fault density were less important factors. This could be due to the study area conditions and method used.
Based on the results, the MARS model is more robust and has a better predictive capability than the LR model for the evaluation and mapping of groundwater potential in this study area. The MARS model has the following advantages compared to traditional regression-based analysis. The MARS model creates its final results by only selecting the important variables from the multiple variables used in the results [65]. This can effectively reduce the time needed for researchers to select the groundwater influence factors during GPM analysis. Even if unnecessary variables are used during this review process, the optimal variables can be selected using the MARS model. In addition, the MARS model is an easy-to-interpret model that can extract complex data in a computationally efficient manner for multivariate problems involving large volumes of data [57].
The GPMs created in this study suggest the possibility of groundwater occurrence in the study area. Through a comprehensive understanding of groundwater potential, GPMs can be used to drive the exploration of groundwater resources effectively and economically, and prevent undesirable effects due to water resource development. Therefore, GPMs can be useful for decision-makers and planners to devise plans for sustainable water resource management, ecologically friendly land use, and environmental preservation of the study area. However, it is still necessary to apply the MARS model in more diverse areas and to compare it with other models to make a reliable judgment of its efficacy. Additional detailed spatial data reflecting geological and hydrogeological conditions should also be used to analyze groundwater potential in the future.

Acknowledgments

This research was financially supported by the BK21 plus Project of the Graduate School of Earth Environmental Hazard System. In addition, this work was supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. NRF-2017R1A2B2009033).

Author Contributions

Soyoung Park wrote the paper and analyzed the data; Se-Yeong Hamm managed the paperwork, Hang-Tak Jeon collected and prepared the input data; Jinsoo Kim suggested the idea for the study.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fitts, C.R. Groundwater Science; Academic Press: San Diego, CA, USA, 2002. [Google Scholar]
  2. Shahid, S.; Nath, S.K.; Roy, J. Groundwater potential modeling in a soft rock area using a GIS. Int. J. Remote Sens. 2000, 21, 1919–1924. [Google Scholar] [CrossRef]
  3. Qadir, M.; Wichelns, D.; Raschid-Sally, L.; Minhas, P.S.; Drechsel, P.; Bahri, A.; McCornick, P.G.; Abaidoo, R.; Attia, F.; El-Guindy, S. Water for Food, Water for Life: A Comprehensive Assessment of Water Management in Agriculture; Molden, D., Ed.; IWMI & Earthscan: London, UK, 2007. [Google Scholar]
  4. Mannap, M.A.; Nampak, H.; Pradhan, B.; Lee, S.; Soleiman, W.N.A.; Ramli, M.F. Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS. Arab. J. Geosci. 2014, 7, 711–724. [Google Scholar] [CrossRef]
  5. Bera, K.; Bandyopadhyay, J. Ground water potential mapping in Dulung watershed using remote sensing & GIS techniques, West Bengal, India. Int. J. Sci. Res. Publ. 2012, 2, 1–7. [Google Scholar]
  6. Sander, P.; Chesley, M.M.; Minor, T.B. Groundwater assessment using remote sensing and GIS in a rural groundwater project in Ghana: lessons learned. Hydrogeol. J. 1996, 4, 40–49. [Google Scholar] [CrossRef]
  7. Singh, A.K.; Prakash, S.R. An integrated approach of remote sensing, geophysics and GIS to evaluation of groundwater potentiality of Ojhala sub-watershed, Mirjapur district, UP, India. In Proceedings of the First Asian Conference on GIS, GPS, Aerial Photography and Remote Sensing, Bangkok, Thailand, 7–9 August 2002. [Google Scholar]
  8. Waikar, M.L.; Nilawar, A.P. Identification of groundwater potential zone using remote sensing and GIS technique. Int. J. Innov. Res. Sci. Eng. Technol. 2014, 3, 12163–12174. [Google Scholar]
  9. Naghibi, S.A.; Dashtpagerdi, M.M. Evaluation of four supervised learning methods for groundwater spring potential mapping in Khalkhal region (Iran) using GIS-based features. Hydrogeol. J. 2016, 1–21. [Google Scholar] [CrossRef]
  10. Reilly, T.E.; Dennehy, K.F.; Alley, W.M.; Cunningham, W.L. Ground-Water Availability in the United States; Circular 1323; U.S. Geological Survey: Reston, VA, USA, 2008.
  11. Elmahdy, S.I.; Mohamed, M.M. Probabilistic frequency ratio model for groundwater potential mapping in Al Jaww plain, UAE. Arab. J. Geosci. 2015, 8, 2405–2416. [Google Scholar] [CrossRef]
  12. Naghibi, S.A.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Rezaei, A. Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed, Iran. Earth Sci. Inform. 2015, 8, 171–186. [Google Scholar] [CrossRef]
  13. Oh, H.J.; Kim, Y.S.; Choi, J.K.; Park, E.; Lee, S. GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea. J. Hydrol. 2011, 399, 158–172. [Google Scholar] [CrossRef]
  14. Ozdemir, A. Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey). J. Hydrol. 2011, 405, 123–136. [Google Scholar] [CrossRef]
  15. Pourtaghi, Z.S.; Pourghasemi, H.R. GIS-based groundwater spring potential assessment and mapping in the Birjand Township, southern Khorasan Province, Iran. Hydrogeol. J. 2014, 22, 643–662. [Google Scholar] [CrossRef]
  16. Corsini, A.; Cervi, F.; Ronchetti, F. Weight of evidence and artificial neural networks for potential groundwater spring mapping: An application to the Mt. Modino area (Northern Apennines, Italy). Geomorphology 2009, 111, 79–87. [Google Scholar] [CrossRef]
  17. Adiat, K.A.N.; Nawawi, M.N.M.; Abdullah, K. Assessing the accuracy of GIS-based elementary multi criteria decision analysis as a spatial prediction tool—A case of predicting potential zones of sustainable groundwater resources. J. Hydrol. 2012, 440, 75–89. [Google Scholar] [CrossRef]
  18. Razandi, Y.; Pourghasemi, H.R.; Neisani, N.S.; Rahmati, O. Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Sci. Inform. 2015, 8, 867–883. [Google Scholar] [CrossRef]
  19. Mogaji, K.A.; Lim, H.S.; Abdullah, K. Regional prediction of groundwater potential mapping in a multifaceted geology terrain using GIS-based Dempster-Shafer model. Arab. J. Geosci. 2015, 8, 3235–3258. [Google Scholar] [CrossRef]
  20. Naghibi, S.A.; Pourghasemi, H.R. A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resour. Manag. 2015, 29, 5217–5236. [Google Scholar] [CrossRef]
  21. Nampak, H.; Pradhan, B.; Manap, M.A. Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J. Hydrol. 2014, 513, 283–300. [Google Scholar] [CrossRef]
  22. Al-Abadi, A.M.; Al-Temmeme, A.A.; Al-Ghanimy, M.A. A GIS-based combining of frequency ratio and index of entropy approaches for mapping groundwater availability zones at Badra–Al Al-Gharbi–Teeb areas, Iraq. Sustain. Water Resour. Manag. 2016, 2, 265–283. [Google Scholar] [CrossRef]
  23. Yao, D.; Yang, J.; Zhan, X. A novel method for disease prediction: hybrid of random forest and multivariate adaptive regression splines. J. Comput. 2013, 8, 170–177. [Google Scholar] [CrossRef]
  24. Hong, H.; Pourghasemi, H.R.; Pourtaghi, Z.S. Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models. Geomorphology 2016, 259, 105–118. [Google Scholar] [CrossRef]
  25. Pham, B.T.; Pradhan, B.; Bui, D.T.; Prakash, I.; Dholakia, M.B. A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India). Environ. Model. Softw. 2016, 84, 240–250. [Google Scholar] [CrossRef]
  26. Saito, H.; Nakayama, D.; Matsuyama, H. Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: The Akaishi Mountains, Japan. Geomorphology 2009, 109, 108–121. [Google Scholar] [CrossRef]
  27. Trigila, A.; Iadanza, C.; Esposito, C.; Scarascia-Mugnozza, G. Comparison of logistic regression and random forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
  28. Wu, X.; Ren, F.; Niu, R. Landslide susceptibility assessment using object mapping units, decision tree, and support vector machine models in the Three Gorges of China. Environ. Earth Sci. 2014, 71, 4725–4738. [Google Scholar] [CrossRef]
  29. Shruthi, R.B.; Kerle, N.; Jetten, V.; Stein, A. Object-based gully system prediction from medium resolution imagery using random forests. Geomorphology 2014, 216, 283–294. [Google Scholar] [CrossRef]
  30. Carranza, E.J.M.; Laborte, A.G. Random forest predictive modeling of mineral prospectivity with small number of prospects and data with missing values in Abra (Philippines). Comput. Geosci. 2015, 74, 60–70. [Google Scholar] [CrossRef]
  31. Leite, E.P.; de Souza Filho, C.R. Probabilistic neural networks applied to mineral potential mapping for platinum group elements in the Serra Leste region, Carajás Mineral Province, Brazil. Comput. Geosci. 2009, 35, 675–687. [Google Scholar] [CrossRef]
  32. Rigol-Sanchez, J.P.; Chica-Olmo, M.; Abarca-Hernandez, F. Artificial neural networks as a tool for mineral potential mapping with GIS. Int. J. Remote Sens. 2003, 24, 1151–1156. [Google Scholar] [CrossRef]
  33. Kisi, O.; Parmar, K.S. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J. Hydrol. 2016, 534, 104–112. [Google Scholar] [CrossRef]
  34. Lee, S.; Lee, C.W. Application of decision-tree model to groundwater productivity-potential mapping. Sustainability 2015, 7, 13416–13432. [Google Scholar] [CrossRef]
  35. Rahmati, O.; Pourghasemi, H.R.; Melesse, A.M. Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: A case study at Mehran Region, Iran. Catena 2016, 137, 360–372. [Google Scholar] [CrossRef]
  36. Gusyev, M.A.; Haitjema, H.M.; Carlson, C.P.; Gonzalez, M.A. Use of nested flow models and interpolation techniques for science-based management of the sheyenne national grassland, North Dakota, USA. Groundwater 2013, 51, 414–420. [Google Scholar] [CrossRef] [PubMed]
  37. Xu, T.; Valocchi, A.J.; Choi, J.; Amir, E. Use of machine learning methods to reduce predictive error of groundwater models. Groundwater 2014, 52, 448–460. [Google Scholar] [CrossRef] [PubMed]
  38. Yoon, H.; Jun, S.; Hyun, Y.; Bae, G.; Lee, K. A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J. Hydrol. 2011, 396, 128–138. [Google Scholar] [CrossRef]
  39. Naghibi, S.A.; Pourghasemi, H.R.; Abbaspour, K. A comparison between ten advanced and soft computing models for groundwater qanat potential assessment in Iran using R and GIS. Theor. Appl. Climatol. 2017, 1–18. [Google Scholar] [CrossRef]
  40. Ramos-Cañón, A.M.; Prada-Sarmiento, L.F.; Trujillo-Vela, M.G.; Macías, J.P.; Santos-R, A.C. Linear discriminant analysis to describe the relationship between rainfall and landslides in Bogotá, Colombia. Landslides 2016, 13, 671–681. [Google Scholar] [CrossRef]
  41. Conoscenti, C.; Ciaccio, M.; Caraballo-Arias, N.A.; Gómez-Gutiérrez, Á.; Rotigliano, E.; Agnesi, V. Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: A case of the Belice River basin (western Sicily, Italy). Geomorphology 2015, 242, 49–64. [Google Scholar] [CrossRef]
  42. Wang, L.J.; Guo, M.; Sawada, K.; Lin, J.; Zhang, J. Landslide susceptibility mapping in Mizunami City, Japan: A comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. Catena 2015, 135, 271–282. [Google Scholar] [CrossRef]
  43. Eker, A.M.; Dikmen, M.; Cambazoğlu, S.; Düzgün, Ş.H.; Akgün, H. Evaluation and comparison of landslide susceptibility mapping methods: A case study for the Ulus district, Bartın, northern Turkey. Int. J. Geogr. Inf. Sci. 2015, 29, 132–158. [Google Scholar] [CrossRef]
  44. Buyeo-gun office. Statistical Yearbook of Buyeo-Gun; Buyeo-gun: Chungcheongnam-do, Korea, 2016. [Google Scholar]
  45. Ministry of Environment. Groundwater Annual Report; Ministry of Environment: Sejong-si, Korea, 2016.
  46. Aniya, M. Landslide-susceptibility mapping in the Amahata River basin, Japan. Ann. Assoc. Am. Geogr. 1985, 75, 102–114. [Google Scholar] [CrossRef]
  47. Moore, I.D.; Grayson, R.B.; Ladson, A.R. Digital terrain modelling: A review of hydrological, geomorphological, and biological applications. Hydrol. Process. 1991, 5, 3–30. [Google Scholar] [CrossRef]
  48. Wischmeier, W.H.; Smith, D.D. Predicting Rainfall Erosion Losses: A Guide to Conservation Planning; United States Department of Agriculture: Washington, DC, USA, 1978.
  49. Moore, I.D.; Burch, G.J. Sediment transport capacity of sheet and rill flow: application of unit stream power theory. Water Res. 1986, 22, 1350–1360. [Google Scholar] [CrossRef]
  50. Dinesh Kumar, P.K.; Gopinath, G.; Seralathan, P. Application of remote sensing and GIS for the demarcation of groundwater potential zones of a river basin in Kerala, southwest coast of India. Int. J. Remote Sens. 2007, 28, 5583–5601. [Google Scholar] [CrossRef]
  51. Koike, K.; Nagano, S.; Kawaba, K. Construction and analysis of interpreted fracture planes through combination of satellite-image derived lineaments and digital elevation model data. Comput. Geosci. 1998, 24, 573–583. [Google Scholar] [CrossRef]
  52. O’Leary, D.W.; Friedman, J.D.; Pohn, H.A. Lineament, linear, lineation: Some proposed new standards for old terms. Geol. Soc. Am. Bull. 1976, 87, 1463–1469. [Google Scholar] [CrossRef]
  53. Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 2nd ed.; John Wiley and Sons Inc.: New York, NY, USA, 2000. [Google Scholar]
  54. Felicísimo, Á.M.; Cuartero, A.; Remondo, J.; Quirós, E. Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: A comparative study. Landslides 2013, 10, 175–189. [Google Scholar] [CrossRef]
  55. Gutiérrez, Á.G.; Schnabel, S.; Contador, J.F.L. Using and comparing two nonparametric methods (CART and MARS) to model the potential distribution of gullies. Ecol. Model. 2009, 220, 3630–3637. [Google Scholar] [CrossRef]
  56. Friedman, J.H. Multivariate adaptive regression splines. Ann. Stat. 1991, 19, 1–141. [Google Scholar] [CrossRef]
  57. Zhang, W.; Goh, A.T.; Zhang, Y. Multivariate adaptive regression splines application for multivariate geotechnical problems with big data. Geotech. Geol. Eng. 2016, 34, 193–204. [Google Scholar] [CrossRef]
  58. Zabihi, M.; Pourghasemi, H.R.; Pourtaghi, Z.S.; Behzadfar, M. GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran. Environ. Earth Sci. 2016, 75, 1–19. [Google Scholar] [CrossRef]
  59. Menard, S. Applied Logistic Regression Analysis, 2nd ed.; SAGE University Series on Quantitative Applications in the Social Sciences; SAGE: Thousand Oaks, CA, USA, 1995. [Google Scholar]
  60. Ozdemir, A.; Altural, T. A comparative study of frequency ratio, weights of evidence and logistic regression methods for landslide susceptibility mapping: Sultan Mountains, SW Turkey. J. Asian Earth Sci. 2013, 64, 180–197. [Google Scholar] [CrossRef]
  61. Milborrow, S. Notes on the Earth Package. Available online: https://www.milbo.org/doc/earth-varmod.pdf (accessed on 23 June 2017).
  62. Swets, J.A. Measuring the accuracy of diagnostic systems. Science 1973, 240, 1285–1293. [Google Scholar] [CrossRef]
  63. Bui, D.T.; Pradhan, B.; Lofman, O.; Revhaug, I.; Dick, O.B. Landslide susceptibility mapping at Hoa Binh Province (Vietnam) using an adaptive neuro-fuzzy inference system and GIS. Comput. Geosci. 2012, 45, 199–211. [Google Scholar] [CrossRef]
  64. Yesilnacar, E.; Topal, T. Landslide susceptibility mapping: A comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng. Geol. 2005, 79, 251–266. [Google Scholar] [CrossRef]
  65. Kennison, R.F.; Cox, J. Health and functional limitations predict depression scores in the health and retirement study: Results straight from MARS. Calif. J. Health Promot. 2013, 11, 97–108. [Google Scholar]
Figure 1. Location of the study area. (a) administrative map showing one town and 15 townships; (b) groundwater well locations divided into training and validation datasets.
Figure 1. Location of the study area. (a) administrative map showing one town and 15 townships; (b) groundwater well locations divided into training and validation datasets.
Sustainability 09 01157 g001
Figure 2. Flow chart of the methodology used in this study.
Figure 2. Flow chart of the methodology used in this study.
Sustainability 09 01157 g002
Figure 3. Factors influencing groundwater. (a) altitude; (b) slope degree; (c) slope aspect; (d) plan curvature; (e) profile curvature; (f) topographic wetness index; (g) stream power index; (h) sediment transport index; (i) distance from drainage; (j) drainage density; (k) lithology; (l) distance from fault; (m) fault density; (n) distance from lineament; (o) lineament density; and (p) land cover.
Figure 3. Factors influencing groundwater. (a) altitude; (b) slope degree; (c) slope aspect; (d) plan curvature; (e) profile curvature; (f) topographic wetness index; (g) stream power index; (h) sediment transport index; (i) distance from drainage; (j) drainage density; (k) lithology; (l) distance from fault; (m) fault density; (n) distance from lineament; (o) lineament density; and (p) land cover.
Sustainability 09 01157 g003
Figure 4. Groundwater potential maps produced by the (a) LR and (b) MARS model.
Figure 4. Groundwater potential maps produced by the (a) LR and (b) MARS model.
Sustainability 09 01157 g004
Figure 5. Results of model validation for each GPM. (a) success rate and (b) prediction rate curves.
Figure 5. Results of model validation for each GPM. (a) success rate and (b) prediction rate curves.
Sustainability 09 01157 g005
Table 1. Data sources used in this study.
Table 1. Data sources used in this study.
CategoryFactorSourceScale (Resolution)GIS and Data Type
Well location National research paper Point
Local research paper
Field survey
Topographical factorsAltitudeTopographic digital map 11:5000Polyline, point
Slope degreeDigital elevation map10 × 10 mRaster
Slope aspect
Plan curvature
Profile curvature
Hydrological factorsTopographic wetness indexDigital elevation map10 × 10 mRaster
Stream power index
Sediment transport index
Distance from drainage
Drainage density
Geological factorsLithologyGeology map 21:50,000Polygon
Distance from faultGeology map1:50,000Polyline
Fault density
Distance from lineamentHill-shaded map10 × 10 mRaster
Lineament density
Land coverLand coverLand cover map 31:25,000Polygon
1 Topographic digital maps were obtained from the National Geographic Information Institute, Korea; 2 Geology maps were obtained from the Korea Institute of Geoscience and Mineral Resources; 3 Land cover maps were obtained from the Ministry of Environment, Korea.
Table 2. Multicollinearity diagnostic indices for independent variables.
Table 2. Multicollinearity diagnostic indices for independent variables.
FactorToleranceVIF
Altitude0.4472.235
Slope degree0.2234.483
Slope aspect0.5221.916
Plan curvature0.5891.698
Profile curvature0.7471.338
Topographic wetness index0.1666.010
Stream power index0.1805.543
Sediment transport index0.3802.629
Distance from drainage0.6041.655
Drainage density0.6091.641
Lithology0.9431.061
Distance from fault0.8331.201
Fault density0.7621.312
Distance from lineament0.4922.031
Lineament density0.4972.012
Land cover0.8451.183
Table 3. Spatial relationships between groundwater wells and groundwater influencing factors determined using the frequency ratio model.
Table 3. Spatial relationships between groundwater wells and groundwater influencing factors determined using the frequency ratio model.
FactorClassNo. of Pixels for Domain% of DomainNo. of Wells% of WellsFrequency Ratio
Altitude4–7739,90611.83313.610.30
7–13758,49012.1312114.091.16
13–23694,22711.1019222.352.01
23–39706,01111.2918621.651.92
39–59685,60410.9611813.741.25
Altitude59–85671,76910.74768.850.82
85–124673,52210.77536.170.57
124–189662,16610.59687.920.75
>189661,26510.58141.630.15
Slope degree0.001,346,46621.5321925.491.18
0.00–2.96540,6298.6514316.651.93
2.96–6.72848,90213.5834139.702.92
6.72–10.76578,5969.2511012.811.38
10.76–14.52584,4249.35263.030.32
14.52–18.29595,3439.5270.810.09
18.29–22.05570,8439.1370.810.09
22.05–27.16592,4919.4840.470.05
>27.16595,2669.5220.230.02
Slope aspectFlat1,346,46621.5321925.491.18
N550,5648.80475.470.62
NE666,73110.669611.181.05
E656,32410.50839.660.92
SE682,65410.9210912.691.16
S635,31910.16829.550.94
SW664,07110.6210712.461.17
W530,3998.48606.980.82
NW520,4328.32566.520.78
Plan curvatureConvex1,912,37130.5816619.320.63
Flat2,574,06241.1750558.791.43
Concave1,766,52728.2518821.890.77
Profile curvatureConvex2,035,08632.5521224.680.76
Flat1,992,42531.8638144.351.39
Concave2,225,44935.5926630.970.87
Topographic wetness index−7.84–5.2728,73011.65192.210.19
−5.20–0.57489,5557.8316318.982.42
0.57–1.651,308,96420.93576.640.32
1.65–2.491,336,21221.3712514.550.68
2.49–3.45717,82311.4817620.491.78
3.45–5.01525,5548.4015117.582.09
5.01–7.89649,62710.3910412.111.17
7.89–10.30436,1786.98515.940.85
>10.3060,3170.96131.511.57
Stream power index−3.82–8.60668,02410.6819722.932.15
−8.60–6.26720,08011.52364.190.36
−6.26–3.26713,70011.4112514.551.27
−3.26–1.60740,14611.8413115.251.29
−1.60–0.71713,22411.419210.710.94
−0.71–0.07750,10012.00677.800.65
0.07–0.73674,15410.78667.680.71
0.73–1.62656,25610.50647.450.71
>1.62617,2769.87819.430.96
Sediment transport index0.002,608,60241.7241548.311.16
0.00–14.802,731,78943.6942649.591.14
14.80–29.60731,16011.69151.750.15
29.60–44.39123,3801.9720.230.12
44.39–59.1932,5640.5210.120.22
59.19–73.9912,1380.1900.000.00
73.99–103.597,8630.1300.000.00
103.59–162.783,2570.0500.000.00
>162.782,2070.0400.000.00
Distance from drainage0–195.206674,22910.78778.960.83
195.206–429.45740,19511.8410712.461.05
429.45–663.70705,57311.289010.480.93
663.70–917.47710,24511.3610412.111.07
917.47–1190.76706,35411.3011413.271.17
1190.76–1483.56674,17510.788910.360.96
1483.56–1834.94684,11010.9411012.811.17
1834.94–2342.47694,45411.1110512.221.10
>2342.47663,62510.61637.330.69
Drainage density0.00–0.18647,89910.36677.800.75
0.18–0.73764,11712.2214516.881.38
0.73–1.24696,05811.1312114.091.27
1.24–1.74735,69611.7710912.691.08
1.74–2.47692,59111.08819.430.85
2.47–3.21692,81311.08698.030.72
3.21–4.31687,44710.999210.710.97
4.31–5.73669,01910.709310.831.01
>5.73667,32010.67829.550.89
LithologyMetamorphic rock1,993,70031.8823527.360.86
Sedimentary rock A3,251,21651.9944051.220.99
Sedimentary rock B86,1841.3810.120.08
Igneous rock892,83914.2817420.261.42
Dike and talus29,0190.4691.052.26
Distance from fault0.00–777.23680,73510.89384.420.41
777.23–1498.94725,02011.59738.500.73
1498.94–2220.65738,88911.82789.080.77
2220.65–2997.87689,01911.0212214.201.29
2997.87–3997.16684,57110.9519522.702.07
3997.16–5163.00700,83711.2112013.971.25
5163.00–6550.90675,09310.80849.780.91
6550.90–8493.97685,67810.97748.610.79
>8493.97673,11810.76758.730.81
Fault density0.004,958,06579.2976488.941.12
0.00–0.05411,1006.57435.010.76
0.05–0.13145,6782.33121.400.60
Fault density0.13–0.28137,7772.20131.510.69
0.28–0.47128,6832.0650.580.28
0.47–0.74118,6251.9080.930.49
0.74–1.13118,1721.8950.580.31
1.13–1.83117,6381.8880.930.50
>1.83117,2221.8710.120.06
Distance from lineament0.00–131.55673,02910.76546.290.58
131.55–279.54714,67611.439511.060.97
279.54–443.97750,30412.0012013.971.16
443.97–624.85714,44511.4312714.781.29
624.85–855.05722,40611.559210.710.93
855.05–1151.03698,35811.1710011.641.04
1151.03–1529.23663,27610.6110011.641.10
1529.23–2022.53658,33710.539210.711.02
>2022.53658,12910.53799.200.87
Lineament density0.001,943,65231.0826831.201.00
0.00–0.13583,5779.33768.850.95
0.13–0.23578,4009.25859.901.07
0.23–0.35593,9259.5010412.111.27
0.35–0.46540,8098.65485.590.65
0.46–0.60518,3218.299310.831.31
0.60–0.80517,8558.28839.661.17
0.80–1.09495,9197.93647.450.94
>1.09480,5027.68384.420.58
Land coverUrban322,5715.1611012.812.48
Farmland2,504,40340.0567278.231.95
Forest2,941,17347.04536.170.13
Grassland134,7272.1560.700.32
Wetland67,3621.0810.120.11
Bare land52,7770.84131.511.79
Water229,9473.6840.470.13
Table 4. β coefficients of groundwater influence factors used in the logistic regression model.
Table 4. β coefficients of groundwater influence factors used in the logistic regression model.
FactorsβStd. Errorz ValuePr (>|z|)
(Intercept)5.0161.8162.7620.006 *
Altitude0.0021.5 × 10−31.4190.156
Slope degree−0.2313.9 × 10−2−5.9372.9 × 10−9 *
Slope aspect2 (N)−1.4121.015−1.3910.164
Slope aspect3 (NE)−0.9819.7 × 10−1−1.0160.309
Slope aspect4 (E)−1.2501.013−1.2340.217
Slope aspect5 (SE)−1.2869.5 × 10−1−1.3510.177
Slope aspect6 (S)−1.1261.013−1.1120.266
Slope aspect7 (SW)−1.3659.4 × 10−1−1.4550.146
Slope aspect8 (W)−1.3841.026−1.3490.177
Slope aspect9 (NW)−1.4989.8 × 10−1−1.5320.126
Plan curvature2 (Flat)−0.0362.7 × 10−1−0.1350.893
Plan curvature3 (Concave)0.2782.5 × 10−11.1320.258
Profile curvatire2 (Flat)0.6292.4 × 10−12.6620.008 *
Profile curvature3 (Concave)−0.0511.9 × 10−1−0.2720.785
Topographic wetness index−0.3671.4 × 10−1−2.6820.007 *
Stream power index0.3511.4 × 10−12.5570.011 *
Sediment transport index−5.8 × 10−46.6 × 10−3−0.0870.931
Distance from drainage2.3 × 10−49.5 × 10−52.3920.017 *
Drainage density−0.0203.3 × 10−2−0.6090.543
Lithology2 (Sedimentary rock A)0.1721.5 × 10−11.1650.244
Lithology3 (Sedimentary rock B)−2.8731.077−2.6680.008 *
Lithology4 (Igneous rock)0.5811.9 × 10−13.0590.002 *
Lithology5 (Dike and talus)0.1876.5 × 10−10.2870.774
Distance from fault−1.0 × 10−52.3 × 10−5−0.4440.657
Fault density−0.7173.0 × 10−1−2.430.015 *
Distance from lineament−2.3 × 10−41.2 × 10−4−1.9710.049 *
Lineament density0.2272.4 × 10−10.9380.348
Land cover2 (Farmland)−0.3322.1 × 10−1−1.6130.107
Land cover3 (Forest)−1.9922.8 × 10−1−7.0641.6 × 10−12 *
Land cover4 (Grassland)−0.9625.9 × 10−1−1.6180.106
Land cover5 (Wetland)−3.0551.068−2.860.004 *
Land cover6 (Bare land)0.9807.4 × 10−11.3210.187
Land cover7 (Water)−2.7175.8 × 10−1−4.6593.2 × 10−6 *
* p < 0.05.
Table 5. The contributions of various independent variables in the MARS model.
Table 5. The contributions of various independent variables in the MARS model.
FactorsNsubsetGCVRSS
Land cover3 (Forest)24100100
Altitude2360.865.3
Slope degree2249.255.4
Land cover7 (Water)1822.634.3
Distance from fault1716.930.6
Land cover5 (wetland)1515.828.7
Distance from lineament1313.326
Lineament density1313.326
Lithology3 (Sedimentary rock B)1312.725.8
Distance from drainage1110.523.2
Lithology2 (Sedimentary rock A)99.420.9
Drainage density77.718.2
Lithology4 (Igneous rock)3511.9
Fault density23.29.4

Share and Cite

MDPI and ACS Style

Park, S.; Hamm, S.-Y.; Jeon, H.-T.; Kim, J. Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS. Sustainability 2017, 9, 1157. https://doi.org/10.3390/su9071157

AMA Style

Park S, Hamm S-Y, Jeon H-T, Kim J. Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS. Sustainability. 2017; 9(7):1157. https://doi.org/10.3390/su9071157

Chicago/Turabian Style

Park, Soyoung, Se-Yeong Hamm, Hang-Tak Jeon, and Jinsoo Kim. 2017. "Evaluation of Logistic Regression and Multivariate Adaptive Regression Spline Models for Groundwater Potential Mapping Using R and GIS" Sustainability 9, no. 7: 1157. https://doi.org/10.3390/su9071157

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop