Assessment of Groundwater Nitrate Pollution Potential in Central Valley Aquifer Using Geodetector-Based Frequency Ratio (GFR) and Optimized-DRASTIC Methods

Groundwater nitrate contamination in the Central Valley (CV) aquifer of California is widespread throughout the valley because of excess nitrogen fertilizer leaching down into the aquifer. The percolation of nitrate depends on several hydrogeological conditions of the valley. Groundwater contamination vulnerability mapping uses hydrogeologic conditions to predict vulnerable areas. This paper presents a new Geodetector-based Frequency Ratio (GFR) method and an optimized-DRASTIC method to generate nitrate vulnerability index values for the CV. The optimized-DRASTIC method combined the individual weights and rating values for Depth to water, Recharge rate, Aquifer media, Soil media, Topography, Impact of vadose zone, and Hydraulic conductivity. The GFR method incorporated the Frequency-Ratio (FR) method to derive rating values and the Geodetector method to derive relative Power of Determinant (PD) values as weights to generate nitrate susceptibility index map. The optimized-DRASTIC method generated very-high to high index values in the eastern part of the CV. The GFR method showed very-high index values in most part of the San Joaquin and Tulare basin. The quantitatively derived rating values and weights in the GFR method improved the vulnerability index and showed better consistency with the observed nitrate contamination pattern than optimized-DRASTIC index, suggesting that GFR is a better method for groundwater contamination vulnerability mapping in the CV aquifer.


Introduction
The Central Valley (CV) aquifer lies under one of the most productive areas of the United States.It accounts for 17% of the total national irrigated land used for agriculture and 75% of the irrigated land of the state of California [1].In the CV, for the past several decades huge amount of chemical nitrogen fertilizer has been applied to increase the crop productivity [2].Based on a mass balance study, the long-term assessment of nitrogen loading in the Sacramento Valley, San Joaquin Basin, and Tulare Basin of the CV observed significant increase in the nitrogen fertilizer application rate between year 2002-2012 compared to 1991-2001 [3].The estimated 740,000 tons of nitrogen fertilizer was applied to 6.7 million acres of irrigated farmland in California.The excess nitrogen fertilizer, not absorbed by crops, as a chemical contaminant, easily dissolves in water and leaches into the aquifer to pollute the groundwater [4].Harter [4] estimated more than 80 pounds of nitrogen per acre per year leaching into the groundwater.The nitrate contamination level in some public, domestic, and monitoring wells of the CV have exceeded the EPA maximum contaminant level (MCL) of 10 mg/L [5][6][7].Several studies have attributed the groundwater nitrate concentration level to agricultural activities of the region [7][8][9].
The susceptibility of groundwater nitrate contamination is influenced by the aquifer characteristics and geochemical condition of the valley, which could enhance or reduce the leaching rate of nitrate into the aquifer [10].The hydrogeological conditions like shallow water table, high permeability, and high hydraulic conductivity enhance the downward percolation rate of the nitrate into the aquifer.The lower nitrate concentration has been found in reducing environment due to denitrification process that breaks down nitrate into nitrogen gas [11].The interplay of these different environmental factors often makes it difficult to predict the areas vulnerable to groundwater nitrate contamination.However, identification of vulnerable areas is of extreme importance for groundwater management agencies to develop site-specific mitigation tools and adopt appropriate policies to mitigate the contamination.
The process-based, statistical, and index-overlay methods are the most widely used techniques to assess the vulnerability of the aquifer to groundwater nitrate contamination.Process-based methods are deterministic ways of predicting the nitrate contamination based on solute transport model.Physical, chemical, geological, and biological processes are considered to simulate the subsurface flow of contaminants, involving complex numerical computations [12].The process-based methods are effective at local scale prediction, but because of detailed data requirements, they are not practical for large spatial scale prediction [13,14].
Statistical methods are robust ways of analyzing the relationship between groundwater nitrate contamination and predictor variables.The multiple linear regression calculates individual coefficients of the predictor variables to measure its contribution to the nitrate contamination [15,16].There are numerous studies where logistic regression has been applied to calculate the probability of nitrate concentration above the background value [17,18].Other statistical methods have been used to measure the linear trend [19], comparing the mean nitrate concentrations under different agriculture settings [8] or the difference of nitrate concentration over different time intervals [20].The computing power of statistical method to process large data in a short period of time signifies the strength of these methods.However, careful analysis is still warranted in the interpretation of statistical output.The availability of sufficient data meeting the requirement of the statistical analysis often becomes challenging.Further, site-specific nitrate contamination data or data covering regional or national scale are sparse and not uniform, which could cause biasness in the statistical results.Long term monitoring of groundwater data over large spatial scale demands government priority and budgeting over a long period of time.
The index and overlay method uses geological and hydrogeological setting to assess the vulnerability of the study area to the groundwater contamination.This method is preferred for its simplicity to map the vulnerable areas by using different layers of data that affect the transport of nitrate to create spatially oriented index values.Several index and overlay methods has been applied to assess the vulnerability of aquifer such as DRASTIC [21], GOD [22], AVI, and SINTACS [23].The selection of correct method is important in generating the vulnerability map and site-specific methods have been proposed in different study.For example, EPIK and RISK methods are used for Karst aquifers, the AVI method for porous aquifers and GOD for sedimentary aquifers.DRASTIC has been a widely used method and is applicable to all aquifer types [21].DRASTIC was developed by Aller et al. (1987) for the United States Environmental Protection Agency (EPA).The DRASTIC method has been applied frequently to assess the vulnerability of the study area to groundwater nitrate contamination [13,14,[24][25][26].Recently, the application of GIS techniques has enhanced the ability to process different data layers to compute the DRASTIC Index values [27,28].The method has been relatively successful in generating the vulnerability map with minimum error in the model results.It is important to understand that the selection of vulnerability method could produce difference in the output map and therefore researcher should be aware in using the correct method.Some of the limitations of DRASTIC method include its subjectivity in assigning the weight values to its variables and the dimensionless index values [29,30].The pre-assigned weights to DRASTIC variables could be inaccurate for a given study area and could create biases in the calculation of DRASTIC Index values, as the influence of variables to vulnerability could change in different study areas.The generalization of the subjective weights of DRASTIC method have been criticized in several studies [30].The need to add additional variables to increase the efficiency of method has been realized in many studies.Several studies have modified the DRASTIC variables by adding landuse variables [14,31] and calibrating the point rating of variables based on correlation of nitrate concentration and hydrological variables to the model or modified the rating values to increase the efficiency of the method [32].Some authors have also emphasized on the availability and reliability of data suitable for the DRASTIC methods and interdependence of variable in linear combination could cause uncertainty in the method [30].
Index-overlay methods, process-based methods, and statistical methods all have their strengths and limitations.The pros and cons of these methods can be accounted to correctly meet the scientific objective, budgeting constraints, and to aid in the decision supporting process.In this study, we developed two nitrate vulnerability maps: one based on the optimized-DRASTIC method and the other based on the new Geodetector-Frequency Ratio (GFR) method proposed here.Geodetector is a relatively new method that measures the correspondence between the incidence rate of events and the spatially stratified heterogeneity of explanatory variables [33].The Geodetector calculates the Power of Determinant (PD) values for each contributing factor (or independent variable) based on the spatial variance analysis, which estimates the degree of influence a variable has on the incidence rate, which is groundwater nitrate contamination in this case.The Frequency-ratio (FR) is a technique commonly used in landslide susceptibility mapping [34,35] and was recently introduced in groundwater vulnerability mapping [36].The FR measures the percent of area with incidence in each interval of a factor to the percent of area of that interval relative to the whole study area.The interval with higher FR is more susceptible to the incidence.We assigned the weights to the predictor variables based on the Geodetector results and assigned the rating based on the FR method to compute the Nitrate Susceptibility Index (NSI) using the geometric mean [37].
Therefore, the objective of this paper is to generate a nitrate vulnerability map of the CV using the quantitatively derived rating and weights of the variables using GFR method and to compare the result with that generated from optimized-DRASTIC model.As will become evident later, the GFR method generated a map that is more consistent with the spatial distribution of contaminated wells in the CV, because GFR method integrates Geodetector and Frequency-Ratio method, which overcomes the subjective assignment of weights and rating values of variables in optimized-DRASTIC method.

Central Valley Aquifer System
The Central Valley (CV) is one of the largest structural troughs in California (CA) that is predominantly covered with continental and marine sediments.The northern one-third of the valley is Sacramento Valley drained by Sacramento River.The southern two-thirds of the valley is San Joaquin valley further divided into San Joaquin Basin and Tulare Basin drained by San Joaquin River.The CV is bounded by Cascade Range and Sierra Nevada in the east and by Coast Ranges in the west (Figure 1).
The valley floor is mostly layered with alluvial deposits and flood plain deposits of river systems.The sediment deposits are predominantly sand and gravel with significant amount of fine-grained silt and clay.The CV is believed to be a single heterogeneous aquifer system [38].The groundwater flow system in the CV has been altered due to intensive groundwater pumping, causing the water table to drop hundreds of feet thereby increasing hydraulic gradient and directing the groundwater flow into the deeper confined aquifer.Figure 2 shows the change in the flow direction of groundwater before and after the development in the valley.More than 400 feet of water level decline has been measured in the San Joaquin valley.The wells in the San Joaquin are deeper than wells in the Sacramento Valley.The over-pumping of groundwater has induced land subsidence and groundwater quality problem in the CV.The changing hydrogeologic conditions and excess nitrogen fertilizer have complicated the fate and transport of contaminant in the valley [39].

Nitrate Contamination in the CV at Groundwater Basin Scale
The Central Valley groundwater basin data was downloaded from California Department of Water Resources (DWR) [41] .The groundwater basins were identified by California Department of Water Resources (DWR) based on the extraction and recharge of groundwater in those basins to properly manage the sustainable groundwater project in the CV.The Bulletin 118-California's Groundwater has more details on the groundwater basins [42].
The groundwater nitrate concentration data was downloaded from National Water Quality Assessment Program (NAWQA), National Water Information System (NWIS) and Groundwater Ambient Monitoring Assessment (GAMA) for the period of 2002 to 2014 to compute decadal average of groundwater nitrate contamination in the sampled wells.A total of 2516 well samples were collected, and 1014 well samples have mean concentration level above 5 mg/L.The number of wells that exceeded the mean concentration level of 5 mg/L were identified in each groundwater basin.
The background concentration of nitrate in groundwater is usually low in the range of 1-2 mg/L and concentration level greater than this is attributed to the anthropogenic sources [43].The EPA MCL of nitrate in drinking water is 10 mg/L and concentration higher than MCL could cause several health impacts [44].Therefore, 5 mg/L was selected as the threshold as it is deemed to better capture the spatial variability of nitrate contamination [19].The total number of well samples and well samples above 5 mg/L in the groundwater basin of the CV is shown in the Figure 3.

Nitrate Contamination in the CV at Groundwater Basin Scale
The Central Valley groundwater basin data was downloaded from California Department of Water Resources (DWR) [41] .The groundwater basins were identified by California Department of Water Resources (DWR) based on the extraction and recharge of groundwater in those basins to properly manage the sustainable groundwater project in the CV.The Bulletin 118-California's Groundwater has more details on the groundwater basins [42].
The groundwater nitrate concentration data was downloaded from National Water Quality Assessment Program (NAWQA), National Water Information System (NWIS) and Groundwater Ambient Monitoring Assessment (GAMA) for the period of 2002 to 2014 to compute decadal average of groundwater nitrate contamination in the sampled wells.A total of 2,516 well samples were collected, and 1,014 well samples have mean concentration level above 5 mg/L.The number of wells that exceeded the mean concentration level of 5 mg/L were identified in each groundwater basin.
The background concentration of nitrate in groundwater is usually low in the range of 1-2 mg/L and concentration level greater than this is attributed to the anthropogenic sources [43].The EPA MCL of nitrate in drinking water is 10 mg/L and concentration higher than MCL could cause several health impacts [44].Therefore, 5 mg/L was selected as the threshold as it is deemed to better capture the spatial variability of nitrate contamination [19].The total number of well samples and well samples above 5 mg/L in the groundwater basin of the CV is shown in the Figure 3.

Optimized-DRASTIC Method
DRASTIC is a widely used method for assessing the susceptibility of aquifer to groundwater contamination.The DRASTIC consists of seven different variables which are used as the influencing factors to derive the groundwater susceptibility to contamination.The seven variables are: Depth to water (D), Rate of recharge (R), Aquifer media (A), Soil media (S), Topography (T), Impact of vadose zone (I), and Hydraulic conductivity (C).
Each variable is assigned a subjective weight to indicate its influence on the susceptibility.The source of data for all the DRASTIC variable is shown in Table 1.Here, we adopted an optimized-DRASTIC method, where each variable was classified into five different intervals using the natural breaks classification to capture the local hydrogeological setting of the CV [14,25,32,45].The higher rating value was assigned if the interval facilitates the vulnerability.The DRASTIC Index (DI) is the linear combination of all the product of variable weight value and rating value of a given interval (see Equation (1)) and an example is shown in Table 2.
where subscript r represents rate and subscript w represents weight.
The computed optimized-DRASTIC Index was then used as a relative evaluation index to describe the susceptibility of the given area to groundwater contamination.High values of Drastic Index (DI) represent the more vulnerable areas and low values represent the less vulnerable areas.The assumptions of DRASTIC methods are: The contaminant is introduced at the ground surface; the contaminant is flushed into the ground by precipitation; the contaminant has the mobility of water; the area evaluated is 100 acres or larger [21].The seven DRASTIC variables were downloaded and processed to calculate the DRASTIC Index.Table 2 shows the weights and rating values assigned to each variables and interval range.
• Depth to Water Depth to water (DTW) represents the depth to groundwater table below the ground surface.The areas with shallow DTW are considered more vulnerable as it takes less time for contaminant to move into the aquifer in these areas.The data was downloaded from California Department of Water Resources: Groundwater Information Center Interactive Map Application for the year 2014 to represent the end period of the nitrate sampling period from 2002 to 2014 [41].The data was then interpolated using Kriging method in ArcGIS over the CV aquifer.Kriging has been found effective in the interpolation of depth to water table in many studies [52,53].Kriging estimates the values at unknown locations using the weighted average of the known values.The weights in the Kriging method are based on the both distance and overall spatial arrangement of the well samples which is calculated using the variogram technique.A function that best fits the measured data is used to predict the value at the unknown location.The ordinary kriging was selected with minimum standard error.The mean zonal statistics was calculated for each groundwater basin using the interpolated DTW value.

• Recharge Rate
The recharge rate is the rate at which water on the land surface percolates into the aquifer.The percolation rate of water into the aquifer depends on the characteristics of vadose zone.Recharge rate is higher in permeable areas with large fractures in rocks or in unconsolidated sand and gravels, where water can easily pass through the interspace between the soil or sand particles.Soil layer that consists of higher percent of clay reduces the travel time of groundwater into the aquifer and thus the recharge rate.Recharge rate also tends to be higher in high altitude where precipitation is higher, increasing the chances of more water percolating into the groundwater [1].The data for estimated mean annual natural groundwater-recharge in the Conterminous United States was downloaded from United States Geological Survey (USGS), which is a raster dataset with 1 km resolution [46].The zonal mean for each groundwater basin was calculated using the raster data and classified into five intervals of natural breaks and assigned corresponding ratings for optimized-DRASTIC method.

• Aquifer Media
The aquifer media is the underlying geology that determines the water holding capacity of rock materials.Groundwater is stored in the pores and fractures of rocks within the aquifer which determines the groundwater flow rate.Karst limestone, basalt, and unconsolidated sand and gravel have higher flow rate due to their structure than metamorphic, igneous rock, or shales.
The GIS database of geologic units in California is available in the California Geological Survey website [54].The geologic database identifies the major rock type with unique geocode.These rocks, based on the unique geocode, were classified as Quaternary Alluvium, Sedimentary, Metamorphic, or igneous based on Anning et al. [10].The corresponding rating values were assigned based on Aller's et al. [21].Table 3 shows the geocode classification system and rating values.Soil hydrologic group was numerically assigned based on the Battaglin's methods.The codes were 1, 2, and 3 for hydro groups A, B, and C, respectively.The number 4 was assigned to hydro group A/D, B/D, and C/D [48].The map unit identifiers (MUID) in the dataset represent the unique hydrologic conditions.Each unique MUID hydrologic condition was intersected with the groundwater basins.The percent of area of each MUID in each groundwater basin was multiplied by the soil hydrologic value.The product of each proportional area and MUID values was then added up to represent the total value hydrologic value of each basin.The values ranged from 1 (well drained-A) to 4 (poorly drained-D) to represent the soil media of the CV.

• Topography
Topography in DRASTIC is the slope of land surface.The steep topography has less chance of infiltration due to runoff.The plain areas will have higher infiltration rate and hence increase the chance of infiltration of nitrate into the aquifer.The slope data was downloaded from USGS (Elevation derivatives for National Application (EDNA), 2005) [49], which was derived from the National Elevation Dataset (NED).Mean zonal statistics of slope value were calculated for each groundwater basin and classified based on natural breaks classification.

• Impact of Vadose Zone
The vadose zone is the layer above the water table and the Impact of vadose zone is critical to the movement of groundwater into the aquifer.The texture of this layer controls the attenuation capacity of the contaminants including biodegradation, naturalization, mechanical filtration, chemical reaction, volatilization, and flow rate of groundwater [21].The longer residence time of groundwater in vadose zone provides longer time for the attenuation.The Impact of Vadose Zone value was calculated using the soil permeability and depth to water table values, based on the method described in reference [51].The vadose zone values were then given rating values to calculate in the optimized-DRASTIC method.

• Hydraulic Conductivity
Hydraulic Conductivity measures the easiness of water flowing into the aquifer.It depends on the soil pore geometry, the fluid viscosity, and density.The higher hydraulic conductivity increases the water percolation rate into the aquifer.Area and depth weighed average hydraulic conductivity values is available in the USGS website [50].The weighted hydraulic conductivity values for each ISPRS Int.J. Geo-Inf.2018, 7, 211 9 of 26 groundwater basin in the CV were calculated by adding the proportional MUID area and hydraulic conductivity values within each basin.

Geodetector Method
The Geodetector-based frequency ratio method (GFR) integrates the Geodetector method and Frequency Ratio method to calculate Nitrate Susceptibility Index (NSI).The Geodetector method is a spatial variance technique, which measures the spatial association between the predictor variables and the dependent variable [55].The assumption is that if a predictor variable leads to groundwater nitrate contamination, the contamination would exhibit a spatial distribution similar to that of the predictor variable [55,56].The Power of Determinant (PD) value for each predictor variable was obtained by calculating the ratio of sum of variances of predictor variables in the stratified zones (stratified heterogeneity) to the total variance in the entire study area.It is given by the formula: where N is the total sample in the entire study area; L is the number of zones of predictor variable; σ is the variance; and subscript i represents the i-th zone.
If the spatial associate is real, the local variances should be close to zero or very small.If the spatial associate is not there, then the local variance is expected to be the same as the global variance.So, the spatial association can be measured by comparing the local and global variances and encapsulated in the Power of Determinant (PD), which is calculated as 1 minus the ratio of local to global variances.The PD value ranges between 0 and 1, with 1 meaning perfect spatial association and 0 no association.The significant variables at significance level of 0.05 were retained to calculate relative PD values of each variables.Details on the processing of Geodetector method can be found in reference [26].The PD values of significant predictor variables were normalized to calculate its relative contribution in relation to other significant variables.It is given by the formula below: Relative PD = PD value o f Predictor variable Sum o f PD values o f all signi f icant Predictor variables. (3)

Development of Geodetector Variables
The Geodetector method was performed on 12 different predictor variables, which comprised of source variables, aquifer susceptibility variables, and geochemical condition of the CV that could alter the groundwater nitrate contamination of the CV.The details on the processing of the variables can be found in reference [26].Table 4 below shows the summary of the data sources of the predictor variables.All of these variables were input into the Geodetector software to calculate their PD values and significance level.The software is available for free at http://www.geodetector.org.

Frequency Ratio Method
The Frequency Ratio (FR) method was used to obtain the rating values on different intervals of significant predictor variables.Each numerical predictor variable is discretized into different intervals to represent the stratified heterogeneity.Since we used groundwater basin as our basic unit, the number of wells within each interval was counted and divided by total number of wells of the study area to obtain percent of wells in each interval.Similarly, the number of contaminated wells in each interval was counted and divided by total number of contaminated wells of the study area to obtain percent of contaminated wells in each interval.Frequency ratio is the ratio of the two percentages as follows: Frequency Ratio is a commonly applied technique in landslide studies to calculate the landslide incident rate in vulnerable areas.The Frequency Ratio value of greater than 1 indicates strong relationship with the groundwater nitrate contamination and less than 1 indicates weak relationship [66].

Geodetector Frequency Ratio Index
The Geodetector Frequency-Ratio Index was calculated as a weighted geometric mean [37] using the relative weight of significant predictor variables and their frequency ratio values in each interval given by the formula below: where n is the total number of explanatory variables; B k is the normalized PD values for the kth explanatory variable; and A k is the frequency ratio of the kth explanatory variable.

Maps of DRASTIC Variables
Figure 4 shows the distribution of Depth to Water (DTW) in the Central Valley.The DTW was relatively shallow in the Sacramento Valley and deeper in San Joaquin and Tulare Basins.
The hydrogeological characteristics of San Joaquin and Tulare Basins have changed over the years due to excessive pumping of groundwater.This has caused the lowering of water table in this part of the valley [67] .
The Net Recharge rate in the CV is shown in Figure 5.The recharge rate was mostly low in most part of the valley and relatively higher around eastern flank of the valley, mainly around southeastern San Joaquin Basin and northeastern Tulare Basin.The CV lies in the rain shadow due to the presence of Coastal Range in the west acting as orographic barriers to the moisture laden onshore oceanic winds.The eastern part of the valley is Sierra Nevada with higher elevation.This leads to higher annual precipitation rate due to orographic effect and increasing recharge rate there.
Most of the CV is underlain by Quaternary alluvium and surficial marine and non-marine sedimentary rocks deposited from Jurassic to Holocene period (Figure 6).These deposits comprise mostly unconsolidated and semi-unconsolidated sediments, including terrace deposits, sandstone, shale, and gravel deposits.The changes in the deposition environment of the CV has caused overlaying of continental deposits on the marine depositions and formed the freshwater aquifer of the valley [1]. Figure 7 shows the soil media map of the CV aquifer.The Sacramento Valley had mainly soil of hydrologic groups C and D. The San Joaquin and Tulare Basins were mostly hydrologic groups A and B. This soil hydrologic group C was moderately fine to fine textured which permits slow infiltration rate of water through them.Hydrologic soil group A soils drain easily as they have sands or gravels and high infiltration rates.Hydrologic group B soils are mostly moderately fine texture to moderately coarse texture that permits moderately well drained soils.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 11 of 28 sands or gravels and high infiltration rates.Hydrologic group B soils are mostly moderately fine texture to moderately coarse texture that permits moderately well drained soils.3 for details on Geocodes).

Figure 7. Soil Media in Central Valley
The CV had mostly low slope areas between slope percent of 0 to 2% (Figure 8).It can be observed from the figure that Tulare basin had more low slope areas than other parts of the valley.The Impact of Vadose Zone (IVZ) is represented in the Figure 9.The San Joaquin Basin had higher IVZ ratings than Sacramento Valley and Tulare Basin.The Hydraulic Conductivity was low around Sacramento Valley compared to San Joaquin and Tulare Basin (Figure 10).The CV had mostly low slope areas between slope percent of 0 to 2% (Figure 8).It can be observed from the figure that Tulare basin had more low slope areas than other parts of the valley.The Impact of Vadose Zone (IVZ) is represented in the Figure 9.The San Joaquin Basin had higher IVZ ratings than Sacramento Valley and Tulare Basin.The Hydraulic Conductivity was low around Sacramento Valley compared to San Joaquin and Tulare Basin (Figure 10).

DRASTIC Susceptibility Index
The optimized-DRASTIC Susceptibility Index (DI) for the CV was generated using the method described in the Method section (see Figure 11 The power of determinant (PD) values from the Geodetector method is shown in Table 5 below.Shrestha and Luo [26] calculated precipitation, fertilizer, elevation, manure, and clay as the most significant variables in the CV with high PD values out of 12 different explanatory variables.These significant variables suggest the significance of their spatial heterogeneity in contributing to the The general pattern of high index values was observed in the eastern side of the valley compared to western side.The Tulare Basin had high index values in the eastern side and low to very low in western side.The San Joaquin Basin had mostly very high to high index values except small percent of area towards the western flank.The Sacramento Valley also showed relatively higher index values in the eastern part than the western part of the valley.The power of determinant (PD) values from the Geodetector method is shown in Table 5 below.Shrestha and Luo [26] calculated precipitation, fertilizer, elevation, manure, and clay as the most significant variables in the CV with high PD values out of 12 different explanatory variables.These significant variables suggest the significance of their spatial heterogeneity in contributing to the groundwater contamination in the CV.Here, we normalized the PD values to be used as the relative weights in the GFR method.The normalized PD values of the all the significant variables is also shown in the Table 5 below.The precipitation had the highest relative PD value of 0.29 followed by fertilizer, elevation, manure, and clay.show the five statistically significant variables used in the GFR method.All variables were discretized into five zones according to the natural breaks method [68].The precipitation map (Figure 12) of the CV shows low precipitation in Tulare Basin and gradually increasing towards the San Joaquin Basin, and highest around the Sacramento Valley.Figure 13 shows the fertilizer loading in the CV aquifer.The fertilizer loading in the CV was very high around Tulare Basin and San Joaquin Basin.Sacramento Valley had only slightly lower fertilizer loading than San Joaquin and Tulare Basin.
The manure loading was higher in San Joaquin Valley and Tulare Basins than that in the Sacramento Valley (Figure 14).The elevation of Sacramento Valley and San Joaquin Basin was slightly lower than that of the Tulare Basin (Figure 15).The percent of clay was higher in Sacramento valley than that in San Joaquin and Tulare Basins (Figure 16).It was lowest around eastern flank of San Joaquin Basin and Tulare Basin.

Frequency Ratio Method
The Frequency Ratio (FR) value in each interval for all the significant predictor variables is shown in the Table 6.The FR values greater than 1 for all the intervals are highlighted in bold in the table.The FR was greater than 1 in low precipitation levels (1 and 2).The FR was greater than 1 in higher fertilizer levels (3, 4 and 5), in low elevation levels (2 and 3), in high manure levels (3, 4, and 5).The FR value for clay percent was greater than 1 at levels 1, 2, 4, and 5.
Tulare Basin and San Joaquin Basin.Sacramento Valley had only slightly lower fertilizer loading than San Joaquin and Tulare Basin.
The manure loading was higher in San Joaquin Valley and Tulare Basins than that in the Sacramento Valley (Figure 14).The elevation of Sacramento Valley and San Joaquin Basin was slightly lower than that of the Tulare Basin (Figure 15).The percent of clay was higher in Sacramento valley than that in San Joaquin and Tulare Basins (Figure 16).It was lowest around eastern flank of San Joaquin Basin and Tulare Basin.

Geodetector Frequency Ratio (GFR) Index
The Geodetector Frequency Ratio (GFR) index was calculated using the geometric mean (Equation ( 5)) and the result is shown in Figure 17.The GFR index ranged from 3.70 to 5.17   Based on natural breaks classification interval of both optimized-DRASTIC and GFR Index, the GFR method predicted more vulnerable groundwater basins to groundwater nitrate contamination than optimized-DRASTIC method (Figures 11 and 17).The GFR Index had 29.27% of wells in the highest interval range (4.96-5.17)compared to only 12.5% of wells in the highest interval range (164-193) in optimized-DRASTIC Index (Table 7).The histogram of DRASTIC index and GFR index values is plotted to observe their distribution (Figures 18 and 19).The mean value for optimized-DRASTIC Index was 141.87 and that for GFR Index was 4.70.The DRASTIC Index values were positively skewed showing only few groundwater basins in the upper range.The GFR Index values were negatively skewed showing relatively higher number of groundwater basins in the upper range.

Assessment of Vulnerability Index
The results of optimized-DRASTIC and GFR Index were assessed using the Pearson's correlation analysis and Geodetector method.The higher Pearson's correlation coefficient (0.7) between percent of contaminated wells in each basin and the GFR Index suggests that GFR Index was a better predictor of groundwater contamination than the DRASTIC Index (Table 8).Also, spatial analysis variance (SVA) based Geodetector method was applied to measure the spatial association or consistency of both indices.The results showed higher PD value for GFR method (0.49) than that for DRASTIC method (0.062), again suggesting that vulnerability map of GFR method captured the spatial variability of contaminated wells of the CV aquifer better than the optimized-DRASTIC Index maps.In addition, the PD value for GFR method was statistically significant whereas that for DRASTIC was not.The results suggest that the GFR index values was more consistent with the spatial distribution pattern of groundwater nitrate contamination at groundwater basins of the CV.

Assessment of Vulnerability Index
The results of optimized-DRASTIC and GFR Index were assessed using the Pearson's correlation analysis and Geodetector method.The higher Pearson's correlation coefficient (0.7) between percent of contaminated wells in each basin and the GFR Index suggests that GFR Index was a better predictor of groundwater contamination than the DRASTIC Index (Table 8).Also, spatial analysis variance (SVA) based Geodetector method was applied to measure the spatial association or consistency of both indices.The results showed higher PD value for GFR method (0.49) than that for DRASTIC method (0.062), again suggesting that vulnerability map of GFR method captured the spatial variability of contaminated wells of the CV aquifer better than the optimized-DRASTIC Index maps.In addition, the PD value for GFR method was statistically significant whereas that for DRASTIC was not.The results suggest that the GFR index values was more consistent with the spatial distribution pattern of groundwater nitrate contamination at groundwater basins of the CV.

Discussions
This study developed the new GFR method and applied it to analyze the vulnerability of the Central Valley (CV) aquifer to the groundwater nitrate contamination.The GFR result was also compared with the result derived from optimized-DRASTIC method.The optimized-DRASTIC index values were generally higher in the eastern part of the CV and the GFR method predicted the high vulnerability in the San Joaquin and Tulare basin compared to the Sacramento Valley.The GFR method predicted higher number of basins in the higher index range than the optimized-DRASTIC method.
The DRASTIC method is a widely applied groundwater vulnerability assessment method used on local, regional, and national scale.The DRASTIC map can provide a general picture of the vulnerable area of the CV aquifer to the groundwater nitrate contamination.The hydrogeologic variables used in the DRASTIC method are the key variables in determining the susceptibility of the aquifer.The DRASTIC Index values, calculated using weight and rating values of the variables, have been found consistent with other validation techniques in several other studies.In this study, in order to capture the local hydrogeologic condition of the CV, we optimized the DRASTIC method by modifying the range of rating interval of DRASTIC variables using natural breaking classification to calculate the index values [14].In the GFR method, the incorporation of variables with significant PD values, normalized weight, frequency-ratio, and geometric mean to calculate the GFR Index generated the vulnerability map that was more consistent with the measured groundwater concentration values in the aquifer as measured by both Pearson's correlation and the PD value of Geodetector method.
The variables, weights, and rating values in DRASTIC method are preassigned.This pre-assignment of variables, weights, and rating in the DRASTIC method increases the chance of overlooking a variable that might be the underlying factor for increasing the susceptibility of the aquifer.Also, the subjective assignment of weights to the variables may not always correctly represent the influencing variable in a local aquifer.In this study, higher weight is assigned to the Depth to Water (DTW) in the shallow Sacramento Valley than San Joaquin and Tulare basin.However, the hydrogeologic condition of the CV in the southern part of the valley have changed due to hundreds of wells dug to pump the groundwater for irrigation for the last several decades.Although DTW is deep in this part of the valley, the residence time of contaminants is still short due to higher hydraulic gradient caused by heavy pumping, and groundwater generally flows towards the cone of depression [1].The groundwater basins in this part of the CV has the highest priority for groundwater evaluation and monitoring by California Groundwater Elevation Monitoring (CASGEM) [42] .The results of the optimized-DRASTIC method could have been impacted by the inaccurate weighting of DTW.Also, variables specified by DRASTIC method are not always easily available for the study area.In this study, impact of vadose zone (IVZ) data was not easily available for Central Valley.Here, we calculated the IVZ data based on depth to water and permeability [51].The selection process of site specific data to correctly represent contamination in the data processing is crucial to understand the contaminant's spatial distribution.The DRASTIC method has been criticized for its lack of ability to include the site-specific variables in the method.Landuse characteristics and other variables have been added to the DRASTIC method to increase its prediction accuracy.
We borrowed the Frequency-Ratio method from the landslide susceptibility method, which has been successfully applied in many landslides susceptibility studies.In the GFR method, Geodetector identifies precipitation, fertilizer, manure, elevation, and clay as the significant variables among twelve different variables and have direct spatial association with the distribution of the contaminated wells in the CV [26].Although, in this study, Geodetector utilized only 12 different explanatory variables to compute the PD values, the method can process any number of potential explanatory variables to determine their relevance based on the computed PD values and significance tests.In addition, this method can process both categorical and continuous variables, which makes it possible to include wider variety of potential explanatory variables.Furthermore, this method makes no assumptions about the data making it suitable to process nitrate data which are mostly skewed data as they are found in low concentration in normal condition.The GFR method offers a general framework for objectively selecting contributing factors (based on the significance test of PD values) and assigning their weights (based on normalized PD values), which can potentially be applied to any other study areas.Thus, GFR method is more effective than optimized-DRASTIC method for accurately capturing the vulnerability of groundwater contamination, as illustrated in this study in the CV aquifer.
The results of this study are comparable to the findings of previous studies performed in the CV.Ransom et al. [69] also found higher nitrate concentration in wells of San Joaquin and Tulare Basin than Sacramento valley.The Sacramento Valley is composed of fine grained sediments compared to southern part of the CV, which could reduce the susceptibility of Sacramento Valley.There are several studies which modify DRASTIC method by adding the variables or changing the intervals to assign rating values matching the site-specific condition.However, they are mostly site specific, whereas the GFR method offers a general framework that can be applied anywhere.
In this study, a total of 31 groundwater basins were observed with contaminated wells (>5 mg/L) out of 41 basins in the CV.The advantage of the groundwater susceptibility mapping using overlay-index approach is that a vulnerability index value can be calculated for all the basins using the rating and weight values [70].The visual comparison of two vulnerability index maps showed that optimized-DRASTIC method produced generally a higher index value in the eastern flank of the valley, while the GFR index was higher in San Joaquin and Tulare Basins.However, the GFR method was found to be more consistent with observed contamination data than the optimized-DRASTIC method, as measured by the Pearson's correlation coefficient and the PD value between the vulnerability index and percent of contaminated wells in the groundwater basins, which confirmed the validity of the newly developed GFR method.The GFR index values were negatively skewed, showing relatively higher numbers of groundwater basins in the upper range.The spatial variance analysis based Geodetector method was effective and can be used in different areas to measure the spatial association.However, high PD values obtained from the Geodetector method is a statistical output and does not necessarily mean causality.Therefore, we should still be careful in selecting the relevant variables.

Conclusions
The Central Valley aquifer of California is a highly productive land, where 75% of the land is used for agriculture.The heavy application of fertilizer and changing hydrogeological condition caused by irrigation has exacerbated the groundwater nitrate contamination over last several decades in the CV.Groundwater nitrate contamination above the EPA MCL in drinking water is a health hazard that could induce several health problems.The optimized-DRASTIC method and GFR method were applied to perform a comparative study on the vulnerability of groundwater nitrate contamination in the valley.The subjectively preassigned hydrogeologic variables and weights in the optimized-DRASTIC method may not accurately reflect the changed hydrogeologic conditions in San Joaquin and Tulare Basins region due to heavy pumping, which could have promoted downward percolation of nitrate in this part of the valley, and thus limited the accuracy of its output map.On the other hand, the GFR method integrated the frequency ratio method, which was proven to be effective in landslide susceptibility studies, and the Geodetector method, which has the ability to objectively select contributing factors (based on the significance test of PD values) and assign their weights (based on normalized PD values), and thus produced a more accurate groundwater contamination vulnerability index map as shown by the Pearson correlation coefficient and Geodetector PD values between the index value and percent of contaminated wells in the groundwater basins.In addition, the Geodetector method in GFR method makes no assumption about the data and works for both categorial and continuous data.These unique properties of GFR method render it an accurate and effective method for the groundwater vulnerability assessment.The GFR method offers water resources managers and policy makers a general framework to assess groundwater contamination vulnerability in any other study areas.
Author Contributions: A.S. and W.L. conceived and designed the research; A.S. performed data processing and analysis; W.L. contributed to the interpretation of the results; both authors wrote, revised, and improved the proposed article.
ISPRS Int.J. Geo-Inf.2018, 7, x FOR PEER REVIEW 4 of 28 groundwater quality problem in the CV.The changing hydrogeologic conditions and excess nitrogen fertilizer have complicated the fate and transport of contaminant in the valley[39].

Figure 2 .
Figure 2. Cross-section of San Joaquin Valley during predevelopment (top) and post-development (bottom) period.Figure from reference [40].

Figure 2 .
Figure 2. Cross-section of San Joaquin Valley during predevelopment (top) and post-development (bottom) period.Figure from reference [40].

Figure 3 .
Figure 3. Groundwater basins and well samples in Central Valley

Figure 3 .
Figure 3. Groundwater basins and well samples in Central Valley.

Figure 4 .
Figure 4. Distribution of depth to water in Central Valley

Figure 5 .
Figure 5. Recharge rate in Central Valley.

Figure 7 .
Figure 7. Soil Media in Central Valley.

Figure 11 .
Figure 11.DRASTIC Susceptibility Index in Central Valley.

Figure 18 .
Figure 18.Histogram of DRASTIC Index values.Figure 18. Histogram of DRASTIC Index values.

Table 2 .
Ratings and weights given to DRASTIC variables.

Table 3 .
Classification and rating values of geologic units.Ku, J, K, KJf, P, M, E, KI, Mc, Tc, KI, J, Tc, E, Ec, Ep, O, P, Qoa Sedimentary Dominated formation of all ages 6 gb, grMz, Tv, TvP, Mzv, um, grMz, gb, mv, m, Qvp, gr-m, m, um Metamorphic or Igneous unit 3 • Soil Media Soils are grouped based upon their composition and runoff potential and can be represented by soil hydrologic group.The soil hydrologic group data was extracted from soil data for Conterminous United States derived from the Natural Resource Conservation Service (NRCS) State Soil Geographic (STATSGO) data base.

Table 5 .
PD values and normalized PD values of significant predictor variables.

Table 6 .
Frequency ratio method of significant explanatory variables.

Table 7 .
Natural Breaks classification of DRASTIC and GFR Index.

Table 7 .
Natural Breaks classification of DRASTIC and GFR Index.

Table 8 .
PD values and Pearson's correlation for DRASTIC and GFR method.

Table 8 .
PD values and Pearson's correlation for DRASTIC and GFR method.
** statistically significant at the 0.01 level.