Analysis of Groundwater Nitrate Contamination in the Central Valley : Comparison of the Geodetector Method , Principal Component Analysis and Geographically Weighted Regression

Groundwater nitrate contamination in the Central Valley (CV) aquifer of California is a ubiquitous groundwater problem found in various parts of the valley. Heavy irrigation and application of fertilizer over the last several decades have caused groundwater nitrate contamination in several domestic, public and monitoring wells in the CV above EPA’s Maximum Contamination level of 10 mg/L. Source variables, aquifer susceptibility and geochemical variables could affect the contamination rate and groundwater quality in the aquifer. A comparative study was conducted using Geodetector (GED), Principal Component Analysis (PCA) and Geographically Weighted Regression (GWR) to observe which method is most effective at revealing environmental variables that control groundwater nitrate concentration. The GED method detected precipitation, fertilizer, elevation, manure and clay as statistically significant variables. Watersheds with percent of wells above 5 mg/L of nitrate were higher in San Joaquin and Tulare Basin compared to Sacramento Valley. PCA grouped cropland, fertilizer, manure and precipitation as a first principal component, suggesting similar construct of these variables and existence of data redundancy. The GWR model performed better than the OLS model, with lower corrected Akaike Information Criterion (AIC) values, and captured the spatial heterogeneity of fertilizer, precipitation and elevation for the percent of wells above 5 mg/L in the CV. Overall, the GED method was more effective than the PCA and GWR methods in determining the influence of explanatory variables on groundwater nitrate contamination.


Introduction
The Central Valley (CV) has been one of the most productive agricultural regions in the United States for the last 50 years, producing huge quantities of agricultural products.CV, with a total area of 47,000 km 2 , occupies less than 1% of the total farmland in the United States and still produces 8% of the country's agricultural output in cash value (Figure 1).The valley accounts for about one-sixth of the irrigated land and one-eighth of the groundwater pumpage in the United States [1].Approximately 75% of the irrigated land in California and 17% of the nation's irrigated land is in the CV [2].However, the large percent of agricultural land has come at a cost of heavy pumping of groundwater from wells for irrigation [3] and groundwater nitrate contamination from fertilizer application [4].Groundwater withdrawals from the CV principal aquifer is 13% of the country's total withdrawal, making it the second largest in the United States [5].Heavy withdrawal of groundwater and surface water diversion for irrigation have changed the groundwater flow pattern of the CV, which when compounded with the heavy nitrogen fertilizer application rate, has complicated the groundwater contamination by nitrate in the valley [6].The USGS National Water-Quality Assessment (NAWQA) program estimated a nitrate contamination greater than the national median in San Joaquin and Tulare basin of the CV [8].Counties in San Joaquin valley were detected above the Environmental Protection Agency(EPA)-maximum contaminant level ( MCL) of nitrate in over 40% of total 200 domestic well samples [9].Drinking water contaminated with nitrate can cause several health problems; some, like methemoglobinemia, can even cause death in children [10].
The application of fertilizer, manure, septic tanks and domestic animals are some of the major sources of nitrate contamination in the groundwater of the CV [11].However, the concentration of nitrate in the aquifers can be altered by several other environmental factors like rainfall, soil conditions, permeability, and the geochemical condition of the valley.For example, groundwater in the central part of the CV has a more reducing condition that converts nitrate to nitrogen gas (denitrification), and therefore, nitrate in groundwater is lower compared to the eastern part of the valley.The alluvial deposit in the eastern part is relatively coarse grained and receives more rainfall, creating an oxic condition, whereas small grained sediments and the long residence time of groundwater creates a reducing condition as it flows towards the axis of the valley [12,13].Rainfall is comparatively higher in northern Sacramento Valley than the southern Tulare Basin and San Joaquin Basin in the CV, which can dilute the concentration of nitrate or percolate dissolved nitrate into deeper parts of the aquifer.In the CV, the hydrogeological characteristics of San Joaquin Valley have already been impacted severely due to heavy pumping of groundwater, altering the groundwater flow pattern and causing most of the groundwater to flow towards the cone of depression [3].This change in the groundwater flow pattern could alter the transport mechanism of the nitrate dissolved in the groundwater as well.Therefore, several of these environmental variables could make it difficult to predict the fate and transportation of this contaminant in the CV.
Several studies in the San Joaquin Basin, Sacramento Valley and other counties have already confirmed high levels of nitrate in the groundwater of the CV [14][15][16][17][18]. Statistical methods have been used to study the relationship between groundwater nitrate contamination and predictor variables.For example, multivariate statistical methods like ordinary least square regression (OLS) have been applied in many studies to investigate the relationship between groundwater nitrate contamination The USGS National Water-Quality Assessment (NAWQA) program estimated a nitrate contamination greater than the national median in San Joaquin and Tulare basin of the CV [8].Counties in San Joaquin valley were detected above the Environmental Protection Agency(EPA)-maximum contaminant level ( MCL) of nitrate in over 40% of total 200 domestic well samples [9].Drinking water contaminated with nitrate can cause several health problems; some, like methemoglobinemia, can even cause death in children [10].
The application of fertilizer, manure, septic tanks and domestic animals are some of the major sources of nitrate contamination in the groundwater of the CV [11].However, the concentration of nitrate in the aquifers can be altered by several other environmental factors like rainfall, soil conditions, permeability, and the geochemical condition of the valley.For example, groundwater in the central part of the CV has a more reducing condition that converts nitrate to nitrogen gas (denitrification), and therefore, nitrate in groundwater is lower compared to the eastern part of the valley.The alluvial deposit in the eastern part is relatively coarse grained and receives more rainfall, creating an oxic condition, whereas small grained sediments and the long residence time of groundwater creates a reducing condition as it flows towards the axis of the valley [12,13].Rainfall is comparatively higher in northern Sacramento Valley than the southern Tulare Basin and San Joaquin Basin in the CV, which can dilute the concentration of nitrate or percolate dissolved nitrate into deeper parts of the aquifer.In the CV, the hydrogeological characteristics of San Joaquin Valley have already been impacted severely due to heavy pumping of groundwater, altering the groundwater flow pattern and causing most of the groundwater to flow towards the cone of depression [3].This change in the groundwater flow pattern could alter the transport mechanism of the nitrate dissolved in the groundwater as well.Therefore, several of these environmental variables could make it difficult to predict the fate and transportation of this contaminant in the CV.
Several studies in the San Joaquin Basin, Sacramento Valley and other counties have already confirmed high levels of nitrate in the groundwater of the CV [14][15][16][17][18]. Statistical methods have been used to study the relationship between groundwater nitrate contamination and predictor variables.For example, multivariate statistical methods like ordinary least square regression (OLS) have been applied in many studies to investigate the relationship between groundwater nitrate contamination and land use conditions [19][20][21].However, OLS has a drawback of overfitting the data by regressing random error in the model rather than the relationship between the variables when there are numerous variables.The OLS makes several assumptions on the linearity, homoscedasticity, normality and multicollinearity of the data, which are often difficult to apply to nitrate data as they are often skewed.These assumptions lead to a selection of only a few variables based on the researcher's knowledge or use statistical methods that reduce the number of variables.Logistic regression is another common method applied to find the areas above a certain threshold of MCL or background concentration level of nitrate in groundwater [22][23][24].However, logistic regression only calculates the probability of concentration above a certain threshold and does not calculate the direct concentration of contaminants.
Non-parametric tests like the Mann-Whitney test and Kruskal-Wallis test were applied to reveal differences in mean nitrate concentrations in well samples of different land-use types in the eastern San Joaquin Valley [25].Another study conducted at the eastern San Joaquin Valley, using the Kruskal Wallis test and Wilcoxon rank-sum test, confirmed a higher nitrate concentration in shallow aquifers due to non-point sources [26].The Regional Kendal test has also been applied to study the decadal trend of nitrate concentration in different physiographic sub-regions of the CV [13].
Recently, machine learning methods have been used to predict the groundwater nitrate contamination in the CV [27,28].Nolan et al. applied a statistical learning framework to domestic wells data to optimize the over prediction of three machine learning methods: Artificial Neural Network (ANN), Boosted Regression Tree (BRT) and Bayesian Network (BN).Their results showed BRT had the highest predictive performance.These models were also compared to the random forest regression (RFR) model and multiple linear regression (MLR); the results showed that the BRT model was comparable with RFR, but MLR predicted poorly [28].
Principal Component Analysis (PCA) is commonly applied to study the hydrogeochemical parameters of groundwater, including nitrate.PCA reduces the dimensions of data by using a linear combination of variables to capture the maximum variance in the data [29][30][31].The principal scores for each component can also be calculated, which are then used as the independent variables to build the predictive model [32][33][34] or compare it with other variables of interest [35].The correlation matrix between the variables and the loadings of each variable in the principal components are used to investigate the contamination process in the study area.The grouping of redundant variables in the principal components removes the multicollinearity and reduces the number of independent variables into principal components that still explain most of the variance in the data [35].Burrow et al. analyzed geochemical variables including nitrate and other physio-chemical factors using PCA in eastern San Joaquin Valley.The result showed that nitrate concentration were higher in coarse-grained sediments, soil with a low percent of clay and under oxidizing geochemical conditions [25].
Geographically Weighted Regression (GWR) has been commonly used in many studies to determine the effect of spatial heterogeneity on the explanatory variable.OLS assumes that relationships between explanatory variables and the outcome variable are homogeneous (or stationary) across the study area, but GWR assumes that the relationship varies across the study area (non-stationary).This method models the spatially varying relationships by generating individual regressions for each data point where nearby observations are given more weight.The method minimizes the spatial autocorrelation in the residuals and generates local coefficient maps to observe the spatial heterogeneity over the area [36].The GWR method has shown that water quality indicators varied as the landuse changed in the watersheds [37][38][39].Groundwater quantity was also found to change spatially depending on landuse type [40].However, studies have also been performed showing multicollinearities between the coefficients obtained from the GWR model [41].
Geodetector (GED) is a relatively new statistical method that measures spatial stratified heterogeneity and tests its significance [42,43].It can detect important contributing explanatory variables as well as the risk areas of contamination.In addition, it also quantifies if a geographical stratum (associated with one suspected explanatory variable) is more significant than another geographical stratum (of another suspected variable).Also, the method analyses whether two determinants can interact to increase or decrease the chance of contamination in the aquifer [42,44].The GED method has been applied to the analysis of groundwater nitrate contamination in the CV [45].Here, we compare the results of the GED method and those of the PCA and GWR methods to better understand the contamination process and to illustrate the advantages and additional information that the GED method provides in understanding the groundwater contamination process in the CV.The specific objectives of this research are as follows: (a) To understand the controlling explanatory variables of groundwater nitrate contamination in the CV using three statistical methods: Geodetector (GED), Principal Component Analysis (PCA) and Geographically Weighted Regression (GWR); (b) To identify contributing geographical stratum or the risk areas of contamination in the CV; (c) To perform a comparative study of the GED method with PCA and GWR in understanding the contamination process.

Methods
The watershed was selected as the basic analysis unit to quantify the effects of explanatory variables on groundwater nitrate contamination as groundwater and surface water interact in hydrologic processes at a watershed scale.The 12-digit hydrologic unit watersheds of the CV (Figure 2) were selected to process the data.Watershed data were downloaded from USGS-National Standard for the Spatial Data Accuracy and Watershed Boundary Dataset.This is the finest resolution of hydrologic unit hierarchy.There are total of 656 watersheds in the CV.
ISPRS Int.J. Geo-Inf.2017, 6, 297 4 of 25 The GED method has been applied to the analysis of groundwater nitrate contamination in the CV [45].Here, we compare the results of the GED method and those of the PCA and GWR methods to better understand the contamination process and to illustrate the advantages and additional information that the GED method provides in understanding the groundwater contamination process in the CV.The specific objectives of this research are as follows: (a) To understand the controlling explanatory variables of groundwater nitrate contamination in the CV using three statistical methods: Geodetector (GED), Principal Component Analysis (PCA) and Geographically Weighted Regression (GWR); (b) To identify contributing geographical stratum or the risk areas of contamination in the CV; (c) To perform a comparative study of the GED method with PCA and GWR in understanding the contamination process.

Methods
The watershed was selected as the basic analysis unit to quantify the effects of explanatory variables on groundwater nitrate contamination as groundwater and surface water interact in hydrologic processes at a watershed scale.The 12-digit hydrologic unit watersheds of the CV (Figure 2) were selected to process the data.Watershed data were downloaded from USGS-National Standard for the Spatial Data Accuracy and Watershed Boundary Dataset.This is the finest resolution of hydrologic unit hierarchy.There are total of 656 watersheds in the CV.Nitrate data were collected from the National-Water Quality Assessment (NAWQA) project; National Water-Quality Information System (NWIS); and the State Water Resource Control Board-Groundwater Ambient Monitoring and Assessment Program (GAMA).The data were downloaded from these websites for the period of 2002 to 2014.NAWQA and NWIS measured nitrate concentration as NO3-N and GAMA as NO3.To make the concentration data consistent, NO3 was converted to NO3-N.The dataset consists of a total of 2516 well samples spread throughout the Central Valley (Figure 3).Multiple measurements of nitrate data for each well sample were averaged over the study period to obtain the temporal average for each well sample.Nitrate data were collected from the National-Water Quality Assessment (NAWQA) project; National Water-Quality Information System (NWIS); and the State Water Resource Control Board-Groundwater Ambient Monitoring and Assessment Program (GAMA).The data were downloaded from these websites for the period of 2002 to 2014.NAWQA and NWIS measured nitrate concentration as NO 3 -N and GAMA as NO 3 .To make the concentration data consistent, NO 3 was converted to NO 3 -N.The dataset consists of a total of 2516 well samples spread throughout the Central Valley (Figure 3).Multiple measurements of nitrate data for each well sample were averaged over the study period to obtain the temporal average for each well sample.The explanatory variables and their sources are listed in Table 1.There are 12 different explanatory variables.Farm fertilizer, manure and cropland can be categorized as source variables as they directly contribute to the groundwater nitrate.Permeability, precipitation, slope, elevation and clay are the variables that determine the susceptibility of the aquifer to groundwater nitrate contamination.Dissolved oxygen, iron and manganese represent the geochemical condition of groundwater which determines the oxidation-reduction potential of groundwater that could affect the denitrification process of nitrate.Both nitrate data and all of the explanatory variables were processed at the watershed level in the CV.For each watershed, percent of wells above a mean nitrate concentration of 5 mg/L (PWN>5) were calculated (Figure 4) in ArcGIS using the formula:  The explanatory variables and their sources are listed in Table 1.There are 12 different explanatory variables.Farm fertilizer, manure and cropland can be categorized as source variables as they directly contribute to the groundwater nitrate.Permeability, precipitation, slope, elevation and clay are the variables that determine the susceptibility of the aquifer to groundwater nitrate contamination.Dissolved oxygen, iron and manganese represent the geochemical condition of groundwater which determines the oxidation-reduction potential of groundwater that could affect the denitrification process of nitrate.Both nitrate data and all of the explanatory variables were processed at the watershed level in the CV.For each watershed, percent of wells above a mean nitrate concentration of 5 mg/L (PW N>5 ) were calculated (Figure 4) in ArcGIS using the formula: Number of wells exceeding temporal avg.> 5 mg/L in a watershed Total number of wells in each watershed (1)  A nitrate concentration of 5 mg/L was selected as the threshold level because previous studies have used concentrations above 5 mg/L as the background concentration level for nitrate [13].Nitrate concentration in nature is usually less than 1 mg/L as Nitrate-N and therefore, concentrations above 5 mg/L can be attributed to other sources [56].A total of 1014 well samples had a temporal average greater than 5 mg/L.
Explanatory variables were also processed at the watershed level to correspond with the PWN>5 for each watershed.They were then reclassified into different zones to represent the spatially stratified heterogeneity of explanatory variables in the CV.Classification was performed using natural break classification as described in [57].Details on the processing of explanatory variables can be found in [45].The data set was then analyzed using three different methods: Geodetector (GED), Principal Component Analysis (PCA) and Geographically Weighted Regression (GWR).

Geodetector
The Geodetector (GED) method was applied using GeoDetector Software 2007 [42].GED reveals the relationship between the spatial distribution of the nitrate contamination pattern and that of the potential environmental risk factors.The assumption is that if an environmental factor leads to nitrate contamination, the contamination would exhibit a spatial distribution similar to that of the environmental factor [42].Instead of directly comparing the contamination rate with some measure of the environment factor (e.g., rainfall), it calculates the variance of the contamination rate within each zones of the environmental factor (e.g., each rainfall class) and compares the local variances of the contamination rate within each zone of the factor with the global variance of the contamination rate within the entire study area.
For example, in Figure 5a, we have watersheds in the CV used as the basic analysis unit for which we calculated the PWN>5 for each watershed and used this as the response variable.Figure 5b shows the CV subdivided into different rainfall sub-regions (low to high) based on the amount of rainfall in the valley.If we overlay the watershed unit over the rainfall map (Figure 5c), we can calculate the mean and variance of PWN>5 in each different rainfall sub-region.This variance of PWN>5 in each sub-region (local variance) is then compared with the variance of the entire CV (global variance) to calculate the Power of Determinant (PD) value.This can be represented by the formula: A nitrate concentration of 5 mg/L was selected as the threshold level because previous studies have used concentrations above 5 mg/L as the background concentration level for nitrate [13].Nitrate concentration in nature is usually less than 1 mg/L as Nitrate-N and therefore, concentrations above 5 mg/L can be attributed to other sources [56].A total of 1014 well samples had a temporal average greater than 5 mg/L.
Explanatory variables were also processed at the watershed level to correspond with the PW N>5 for each watershed.They were then reclassified into different zones to represent the spatially stratified heterogeneity of explanatory variables in the CV.Classification was performed using natural break classification as described in [57].Details on the processing of explanatory variables can be found in [45].The data set was then analyzed using three different methods: Geodetector (GED), Principal Component Analysis (PCA) and Geographically Weighted Regression (GWR).

Geodetector
The Geodetector (GED) method was applied using GeoDetector Software 2007 [42].GED reveals the relationship between the spatial distribution of the nitrate contamination pattern and that of the potential environmental risk factors.The assumption is that if an environmental factor leads to nitrate contamination, the contamination would exhibit a spatial distribution similar to that of the environmental factor [42].Instead of directly comparing the contamination rate with some measure of the environment factor (e.g., rainfall), it calculates the variance of the contamination rate within each zones of the environmental factor (e.g., each rainfall class) and compares the local variances of the contamination rate within each zone of the factor with the global variance of the contamination rate within the entire study area.
For example, in Figure 5a, we have watersheds in the CV used as the basic analysis unit for which we calculated the PW N>5 for each watershed and used this as the response variable.Figure 5b shows the CV subdivided into different rainfall sub-regions (low to high) based on the amount of rainfall in the valley.If we overlay the watershed unit over the rainfall map (Figure 5c), we can calculate the mean and variance of PW N>5 in each different rainfall sub-region.This variance of PW N>5 in each sub-region (local variance) is then compared with the variance of the entire CV (global variance) to calculate the Power of Determinant (PD) value.This can be represented by the formula: where Var Ci (i = 1, 2, . . .The value of PD ranges from 0 to 1.The PD value will be equal to 1 if the explanatory variable rainfall (C) completely controls the groundwater nitrate contaminations, and it will be zero if it is completely independent of groundwater nitrate contaminations.Values closer to 1 suggest greater contribution to groundwater nitrate by rainfall as a determinant variable.
Risk detector calculates the mean value of PWN>5 of each sub-region (MCi) where i = 1, 2, …, 5 and tests if the mean value is statistically significantly different from the mean value of all the remaining sub-regions.The explanatory variable, rainfall (continuous variable) is discretized from The value of PD ranges from 0 to 1.The PD value will be equal to 1 if the explanatory variable rainfall (C) completely controls the groundwater nitrate contaminations, and it will be zero if it is completely independent of groundwater nitrate contaminations.Values closer to 1 suggest greater contribution to groundwater nitrate by rainfall as a determinant variable.
Risk detector calculates the mean value of PW N>5 of each sub-region (M Ci ) where i = 1, 2, . . ., 5 and tests if the mean value is statistically significantly different from the mean value of all the remaining sub-regions.The explanatory variable, rainfall (continuous variable) is discretized from Level 1 to Level 5 (low to high), and risk detector identifies which level has the highest mean nitrate concentration compared to others and if it is statistically significant.
Ecological Detector compares if the geographical stratum of C (e.g., rainfall) contributes to the groundwater nitrate contamination more significantly than another geographical stratum of D (e.g., Permeability).If C (e.g., rainfall) has more control over the contamination, it will have a lower dispersion variance (σ 2 ).
Interaction detector calculates the combined effect of two explanatory variables in the CV.Individual PD values of variables are then compared with the combined PD values of given variables to assess whether they strengthen, weaken or are independent of each other, e.g., nonlinear enhancement exists if the PD value of the new factor is greater than the PD value of each individual factor summed together [42].More details of the method and statistical tests can be found in [42,43].

Principal Component Analysis
Principal Component Analysis (PCA) was used to analyze the data for all 12 explanatory variables using SPSS.PCA is a dimension reduction technique that reduced the 12 explanatory variables into a few number of principal components (PC) that explained most of the variance of the data.Eigenvalues based on the standardized correlation matrix of the data were utilized and rotated using varimax rotation to maximize the variation.This is achieved by rotating the original variables coordinate system into a new coordinate system (orthogonal to each other) towards the direction of maximum variance of the data.The first PC measured the maximum variance in the data set, the second PC measured less variance than the first PC and the third PC measured less than the second PC and so on.The contribution of each variable to the newly formed PC is called the loading.Higher loading of a variable in a PC means that variable explained most of the variance in that PC.These newly generated PCs are the uncorrelated variables derived from the correlated or redundant explanatory variables.All the assumptions of the PCA, i.e., that variables are continuous and linearly correlated, and adequate sampling size and outliers were checked before performing the test using SPSS [30,31].

Geographically Weighted Regression
Geographically Weighted Regression (GWR) was performed on the data using ArcGIS 10.4.A properly specified Ordinary Least Square (OLS) model was fit prior to the GWR analysis.All the assumptions of normality, multicollinearity and heteroskedasticity of the OLS model were met before performing GWR.The GWR model is a spatial regression model which estimates OLS-like regressions for each feature in the data set.The OLS model is a simple linear model where the relationship between dependent variable and explanatory variable is assumed constant over the study area and residuals are assumed to be independent.However, normal features that are nearby could be more related than those that are far away (Tobler's Law); hence, spatial autocorrelation exists if residuals are autocorrelated and the assumption of OLS is violated.GWR is an effective approach when there is spatial heterogeneity and the relationship between variables vary over the study area.The GWR model is therefore also known as a local model which considers the spatial non-stationarity in data by building a separate model for the explanatory variable and outcome variable.The key to analysis using the GWR model is defining a proper bandwidth or number of neighbors for each target feature.The shape and size of the bandwidth is dependent on user input for the kernel type, bandwidth method, distance, and number of features.Bandwidth or number of neighbors determines the degree of smoothing in the model.If the kernel type is fixed, the Gaussian kernel is used to solve each local regression analysis using a fixed distance.The adaptive kernel type adapts to the density of the neighbors.Bandwidth method specifies the extent of the kernel.Akaike Information Criterion (AIC) or CV (Cross Validation) selects the optimal distance or number of neighbors [58].

Geodetector Method
The results from the GED method are shown in Table 2.The PD values for source variables (fertilizer and manure) and aquifer susceptibility factors (precipitation, elevation, percent clay) were statistically significant variables with p-values less than 0.05.Geochemical variables (dissolved oxygen, iron and manganese) were not found to be statistically significant using this method.Risk Detector results (Table S1) showed geographical areas with low rainfall (San Joaquin and Tulare Basin) have higher PW N>5 (Figure 6).

Geodetector Method
The results from the GED method are shown in Table 2.The PD values for source variables (fertilizer and manure) and aquifer susceptibility factors (precipitation, elevation, percent clay) were statistically significant variables with p-values less than 0.05.Geochemical variables (dissolved oxygen, iron and manganese) were not found to be statistically significant using this method.Risk Detector results (Table S1) showed geographical areas with low rainfall (San Joaquin and Tulare Basin) have higher PWN>5 (Figure 6).The average PWN>5 at the low fertilizer level (Level 1) was 21.87% and showed an increasing trend of PWN>5 towards the high fertilizer level through level 5 (55.97%).The difference between the mean PWN>5 in Level 1 was significantly different from all the other higher fertilizer levels (Table S1 and Figure 7).
The average PWN>5 in manure Level 1 was 29.46% and showed an increasing trend with PWN>5 of 52.90% in Level 4. The average PWN>5 in Level 1 was significantly different from all the remaining The average PW N>5 at the low fertilizer level (Level 1) was 21.87% and showed an increasing trend of PW N>5 towards the high fertilizer level through level 5 (55.97%).The difference between the mean PW N>5 in Level 1 was significantly different from all the other higher fertilizer levels (Table S1 and Figure 7).
ISPRS Int.J. Geo-Inf.2017, 6, 297 10 of 25 higher levels of manure (Table S1 and Figure 8).The average PWN>5 was highest at the elevation Level 2 (54.08%), which was significantly different from both low-level elevation (Level 1) and high-level elevation (Level 3) (Table S1).Most of the cropland (Figure 9) in the CV was found in the Level 2 range of elevation (Figure 10) which is mostly Tulare Basin.The average PW N>5 in manure Level 1 was 29.46% and showed an increasing trend with PW N>5 of 52.90% in Level 4. The average PW N>5 in Level 1 was significantly different from all the remaining higher levels of manure (Table S1 and Figure 8).The average PW N>5 was highest at the elevation Level 2 (54.08%), which was significantly different from both low-level elevation (Level 1) and high-level elevation (Level 3) (Table S1).Most of the cropland (Figure 9) in the CV was found in the Level 2 range of elevation (Figure 10) which is mostly Tulare Basin.
Groundwater nitrate contamination decreased with increasing percent clay in the soil, as clay particles inhibit the percolation of water into the aquifer.The average PW N>5 was 48.06% at Level 1, which dropped to 26.97% at Level 3 at a statistically significant level.The average PW N>5 in Level 4 and 5 was lower than Levels 1 and 2, but the difference was not statistically significant (Table S1 and Figure 11).

Principal Component Analysis
The Principal Component Analysis (PCA) results showed a Kaiser-Meyer-Olkin (KMO) value of 0.63 (Table 3).The score of 0.50 is suggested as the minimum KMO score to conduct PCA analysis.Therefore, our analysis meets the criteria of sampling adequacy.The Bartlett's test of sphericity was significant (<0.05), rejecting the null hypothesis that the correlation matrix is an identity matrix, meaning all the variables are uncorrelated.Correlation Analysis (Table 4) showed the highest correlation between cropland and manure (0.65), clay and permeability (−0.64), cropland and precipitation (−0.58), manure and fertilizer (0.50).It can be noted that fertilizer, manure and cropland are positively correlated with each other.This points to the fact that fertilizer and manure are applied in the cropland and therefore the correlation between them is high.Precipitation is

Principal Component Analysis
The Principal Component Analysis (PCA) results showed a Kaiser-Meyer-Olkin (KMO) value of 0.63 (Table 3).The score of 0.50 is suggested as the minimum KMO score to conduct PCA analysis.Therefore, our analysis meets the criteria of sampling adequacy.The Bartlett's test of sphericity was significant (<0.05), rejecting the null hypothesis that the correlation matrix is an identity matrix, meaning all the variables are uncorrelated.Correlation Analysis (Table 4) showed the highest correlation between cropland and manure (0.65), clay and permeability (−0.64), cropland and precipitation (−0.58), manure and fertilizer (0.50).It can be noted that fertilizer, manure and cropland are positively correlated with each other.This points to the fact that fertilizer and manure are applied in the cropland and therefore the correlation between them is high.Precipitation is negatively correlated with fertilizer, manure and cropland, suggesting that rainfall is low in this area.The correlation between clay and permeability is high but negative.This is because the percolation of groundwater through a permeable area is high as water can easily percolate through its pore spaces, whereas clay inhibits the flow of water due to its small size particles.Iron and manganese are negatively correlated with dissolved oxygen because iron and manganese are found in low oxygen areas.Dissolved oxygen is negatively correlated with clay for the same reason, as clay size particles are small and have less pore space available for air.Based on the scree plot and Eigen value greater than 1 (Table 5), only 4 components were retained.PC-1 explained 25.75%, PC-2 explained 23.21%, PC-3 explained 12.57% and PC-4 explained 8.96% of the variance.The cumulative percent of the variance explained by these four components was 70.50%.Table 6 shows variables significantly contributing to each PC, with loading >0.60.For PC-1, fertilizer, cropland and manure all have positive loadings, while precipitation has a negative loading.Precipitation is low in San Joaquin Basin and decreases even more towards Tulare Basin.However, these are also the areas of high fertilizer, high manure and high cropland.Although rainfall is low, agriculture is performed here through irrigation by pumping groundwater from wells.Therefore, fertilizer, manure and cropland have positive loading and precipitation has a negative loading.This opposite relationship is also indicated by the correlation matrix where there is a negative correlation of precipitation with cropland (−0.58), fertilizer (−0.43) and manure (−0.49).
In PC-2, clay has the highest loading value of −0.80 followed by permeability (0.76) and manganese (−0.73).In areas of high permeability, nitrate dissolved in groundwater can easily move through the intergranular space between the rocks and can contaminate the aquifer.Therefore, permeability has high positive loadings.On the other hand, clay inhibits the flow of water as it is composed of very fine particles and hence it has a negative loading. Figure 12 shows that high permeable areas in the CV have low percent clay.Manganese, iron and dissolved oxygen represent the geochemical condition of the groundwater in the aquifer.Under anoxic conditions, nitrate can easily break down into nitrogen gas and reduce the concentration of nitrate in groundwater.This is because in the absence of oxygen, microbes in the groundwater preferentially use nitrate as an electron acceptor as a part of the redox reaction to produce energy.Therefore, the oxygen level represents the redox condition of the aquifer.Iron and manganese are also produced as a part of the redox reaction in groundwater, and their concentration increases under anoxic conditions.Therefore, manganese has a negative loading as it represents the anoxic condition, which can break down groundwater nitrate.This is also exhibited by PC-3, where dissolved oxygen has a positive loading, as nitrate concentration in groundwater is higher if the dissolved oxygen is higher and is low under anoxic conditions (Figure 13).The nitrate concentration generally decreases towards the deeper aquifer as anoxic conditions develop and denitrification takes place [13].PC-4 shows elevation and slope both have positive loadings.Slopes are higher at higher elevation.The contamination of groundwater from nitrate is higher at lower elevation as the depth to the water table decreases at lower elevation.

Geographically Weighted Regression
The OLS method was performed prior to the GWR method and showed that the Variance Inflation Factor (VIF) of the variables was less than 7.5 with no multicollinearity.The Jarque-Bera test was not significant (p-value = 0.16), indicating that residuals are not normally distributed.A normally distributed residual means the model is biased and may be missing a key explanatory variable.The adjusted-R 2 value is 0.21 and AIC value is 1611.28.R 2 measures the percent variance

Geographically Weighted Regression
The OLS method was performed prior to the GWR method and showed that the Variance Inflation Factor (VIF) of the variables was less than 7.5 with no multicollinearity.The Jarque-Bera test was not significant (p-value = 0.16), indicating that residuals are not normally distributed.A normally distributed residual means the model is biased and may be missing a key explanatory variable.The adjusted-R 2 value is 0.21 and AIC value is 1611.28.R 2 measures the percent variance

Geographically Weighted Regression
The OLS method was performed prior to the GWR method and showed that the Variance Inflation Factor (VIF) of the variables was less than 7.5 with no multicollinearity.The Jarque-Bera test was not significant (p-value = 0.16), indicating that residuals are not normally distributed.A normally distributed residual means the model is biased and may be missing a key explanatory variable.
The adjusted-R 2 value is 0.21 and AIC value is 1611.28.R 2 measures the percent variance measured by the model, and the Akaike Information Criterion (AIC) measures the model performance.The spatial autocorrelation (Moran's I) value was 0.026 and was not significant, confirming residuals were not spatially clustered.Joint F-statistics were significant (p-value < 0.05), showing the relationship between the dependent variable and explanatory variables were non-stationary and could be improved by the GWR model.Precipitation, fertilizer and elevation were the significant variables (p-value < 0.05) in the OLS model.The coefficient for precipitation was negative, and the coefficient for manure and elevation was positive with PW N>5 .
In this GWR model, Adaptive Kernel was used to provide the geographic weighting in the model as the observations were not regularly positioned in the study area.The output of the GWR shows that the number of nearest neighbors that have been used in the estimation of each set of coefficients is 119 and estimates the percent of data under each kernel.The residual square value is 109,976, which is the sum of the square residual.A smaller residual square means that the GWR mode.fits well with the observed data.The effective number, which measures the complexity of the model, is 16.63.The Sigma value (16.63) is the estimated standard deviation for the residuals, and small values are preferred.
The Akaike Information Criterion (AIC) was used as the bandwidth method, as this method automatically finds the bandwidth, minimizes the AIC value, and gives the best predictions.The AIC value decreased from 1611.28 in the OLS model to 1606.96 in GWR, showing an increase in the fit of the model (Table 7).Generally, models with a lower AIC value and a decrease in the AIC value of more than 3 is considered better.Here, the AIC value of GWR dropped by 5 in comparison with OLS.The GWR model also showed improvement over the OLS model in the overall fit of the model by improving the adjusted-R 2 value from 0.21 to 0.26.However, the local R 2 values ranged from 0.024 to 0.314.Based on the local R 2 values, the model accurately predicted around Sacramento Valley and San Joaquin Basin, but local R 2 values were low around Tulare Basin.
The standardized residual map of GWR showed five watersheds which had standardized residuals greater than 2 (model under-prediction) and only one watershed with a negative standardized residual (model over-prediction).Spatial Autocorrelations (Moran's I) was computed to test if the residuals are random.The value of Moran's I ranges from −1 (complete dispersion) to +1 (complete clustering).Moran's I of value 0 indicates complete spatial randomness.Moran's I for GWR is −0.00028, with p-value 0.96, indicating that we cannot reject the null hypothesis that there is no spatial autocorrelation but accept that residuals are randomly distributed.Any spatial dependencies which might have been present in the OLS model were removed by the GWR model (Figures 14 and 15).The local condition (COND) number of the GWR model ranged from 24 to 27 throughout the CV.Since the COND number is less than 30, there is no local multicollinearity.
The global coefficient for precipitation is −5.6, indicating overall there is a negative relationship between the number of nitrate-contaminated wells and precipitation, perhaps due to the dilution effect of the precipitation in the CV.However, the local map of precipitation coefficients shows (Figure 16) that the value ranged from −13.63 in the north (Sacramento Valley) to 4.10 in the south (Tulare Basin), revealing the spatial heterogeneity of the model.Sacramento receives higher rainfall than the southern parts of the CV and therefore could experience a dilution effect of nitrate in groundwater.Tulare Basin and San Joaquin Basin receive low rainfall, and irrigation is performed by excessive pumping of groundwater from wells.The depth to the water table is deeper in this part of the CV, and the groundwater flows towards the cone of depression.Also, the fertilizer and manure application rates are higher in this area.These conditions facilitate the percolation of nitrate into the aquifer.The local R 2 around this area also suggests that some other influential parameter could be added to the model.The global coefficient for precipitation is −5.6, indicating overall there is a negative relationship between the number of nitrate-contaminated wells and precipitation, perhaps due to the dilution effect of the precipitation in the CV.However, the local map of precipitation coefficients shows (Figure 16) that the value ranged from −13.63 in the north (Sacramento Valley) to 4.10 in the south (Tulare Basin), revealing the spatial heterogeneity of the model.Sacramento receives higher rainfall  The global coefficient for precipitation is −5.6, indicating overall there is a negative relationship between the number of nitrate-contaminated wells and precipitation, perhaps due to the dilution effect of the precipitation in the CV.However, the local map of precipitation coefficients shows (Figure 16) that the value ranged from −13.63 in the north (Sacramento Valley) to 4.10 in the south (Tulare Basin), revealing the spatial heterogeneity of the model.Sacramento receives higher rainfall by excessive pumping of groundwater from wells.The depth to the water table is deeper in this part of the CV, and the groundwater flows towards the cone of depression.Also, the fertilizer and manure application rates are higher in this area.These conditions facilitate the percolation of nitrate into the aquifer.The local R 2 around this area also suggests that some other influential parameter could be added to the model.The global coefficient for the fertilizer is 4.05, indicating an overall positive relationship between nitrate contamination and fertilizer application.All three hydrologic regions in the CV have watersheds with higher fertilizer loadings.The local coefficient map (Figure 17) revealed some spatial heterogeneity in the model.The local coefficient varied from 0 to 6.7.Fertilizer coefficients were relatively higher around Tulare Basin, consistent with heavy irrigation and fertilizer application followed by Sacramento Valley and San Joaquin Basin.
The global coefficient for elevation (Figure 18) is 10.77.The local coefficient ranged from 0 to 22.4.The local coefficient is lowest around Tulare Basin and highest around San Joaquin Basin.Elevation is lowest around Sacramento Valley and San Joaquin Basin, which is more vulnerable to groundwater contamination.This is consistent with the highest local coefficient values there.In general, the coefficient maps of GWR revealed the spatial heterogeneity of significant variables precipitation, fertilizer, and elevation, showing that local effects of groundwater contamination are prevalent in the CV.The global coefficient for the fertilizer is 4.05, indicating an overall positive relationship between nitrate contamination and fertilizer application.All three hydrologic regions in the CV have watersheds with higher fertilizer loadings.The local coefficient map (Figure 17) revealed some spatial heterogeneity in the model.The local coefficient varied from 0 to 6.7.Fertilizer coefficients were relatively higher around Tulare Basin, consistent with heavy irrigation and fertilizer application followed by Sacramento Valley and San Joaquin Basin.The global coefficient for elevation (Figure 18) is 10.77.The local coefficient ranged from 0 to 22.4.The local coefficient is lowest around Tulare Basin and highest around San Joaquin Basin.Elevation is lowest around Sacramento Valley and San Joaquin Basin, which is more vulnerable to groundwater contamination.This is consistent with the highest local coefficient values there.In general, the coefficient maps of GWR revealed the spatial heterogeneity of significant variables precipitation, fertilizer, and elevation, showing that local effects of groundwater contamination are prevalent in the CV.

Discussion
This study employed three different methods: GED, PCA and GWR to analyze the groundwater contamination of nitrate in the CV aquifer.The GED method identified precipitation, fertilizer, elevation, manure and clay as the significant variables contributing to the groundwater nitrate contamination.It showed geographic areas of high fertilizers, high manure, low percent clay and low rainfall areas (mainly San Joaquin Basin and Tulare Basin) as high average PWN>5.The

Discussion
This study employed three different methods: GED, PCA and GWR to analyze the groundwater contamination of nitrate in the CV aquifer.The GED method identified precipitation, fertilizer, elevation, manure and clay as the significant variables contributing to the groundwater nitrate contamination.It showed geographic areas of high fertilizers, high manure, low percent clay and low rainfall areas (mainly San Joaquin Basin and Tulare Basin) as high average PW N>5 .The interaction detector indicated that precipitation interacted well with fertilizer and manure to enhance nitrate contamination as reflected by the higher combined PD value.
The PCA method was used to group the explanatory variables into four different components.The first four components with Eigenvalues greater than 1 explained 70.50% of the variances.The explanatory variables were grouped into different components, and their loading values in each component explained the contribution of each variable into the components.Fertilizer, manure, cropland and precipitation were grouped as redundant variables in PC-1.The positive and negative loadings on PC and the correlation matrix revealed the relationship between explanatory variables.
The GWR method captured the spatial heterogeneity of explanatory variables in the CV.This method improved the model compared to the OLS model by increasing the adjusted R-square and decreasing the AIC value.This method also reduced the spatial autocorrelation of residuals, strengthening the model.The coefficient map of precipitation, fertilizer and elevation was very informative in analyzing how the contribution of explanatory variables to PW N>5 changes over the area of the CV.
The utilization of three different methods helped to demonstrate the groundwater nitrate contamination process in the CV from different perspectives.The application of three different methods not only allowed comparison between the methods but also extracted more information about the contamination process in the CV.While GED identified key contributing explanatory variables and vulnerable geographic areas, PCA grouped the variables into different PCs to understand them in aggregation as they measure the same construction in the data.They are also known as redundant data and often cause multicollinearity in data analysis.For example, GED did not detect cropland as a significant variable, but PC-1 listed cropland, fertilizer, manure and precipitation as variables with loadings >0.60.From PC-1, it can be inferred that fertilizer and manure are applied in the cropland in areas with low rainfall, as precipitation has a negative loading.Also, the risk detector in the GED method showed increasing PW N>5 as the level of cropland increased.Similarly, clay was significant in the GED method, and PC-2 also listed clay, permeability and manganese with high loading values.Clay (negative loading) and permeability are the reverse of each other as permeable layers have higher pore spaces and are often connected to facilitate groundwater flow, whereas clay consists of small sized particles that make water percolation difficult.Manganese also has a negative loading as this element is found under reducing conditions in high clay areas.Dissolved oxygen and iron were not significant in the GED method but had high loading in PC-3.Iron is found under reducing conditions (low oxygen); therefore, it has a negative loading compared to dissolved oxygen.PW N>5 also progressively increased at a high dissolved oxygen level as nitrate is found under aerobic conditions, and the denitrification process breaks down nitrate into nitrogen gas under low dissolved oxygen.Elevation and slope both have positive loadings on PC-4 and elevation was also significant in the GED method.
The GWR, as a statistical method, captured the spatial heterogeneity of the explanatory variables and the overall fit of the model increased for the GWR model as compared to the OLS model.The spatial autocorrelation also showed residuals of the predictor variables that were more random in the GWR model.The OLS model did not show manure as a significant variable, but the GED method did.There is a significant difference in the PW N>5 between the area of low manure and the area of high manure.The coefficient map of precipitation showed a slightly positive coefficient in the southern part of San Joaquin Basin, with a generally clear trend of a slightly negative coefficient in Tulare Basin and more negative coefficient in Sacramento Valley.Since the precipitation is low around Tulare Basin and increases towards northern Sacramento Valley, this could indicate that precipitation is diluting the groundwater nitrate contamination in the well samples.This trend was also exhibited by the GED model, showing decreasing PW N>5 as the precipitation increases from south to north in the CV.The coefficient map of fertilizer also showed a higher coefficient in Tulare Basin and Sacramento Valley.There are numerous watersheds with high fertilizer rates in this part of the CV.The risk detector in the GED method also calculated an increasing average PW N>5 at five different levels of increasing fertilizer rate.The coefficient map of elevation was high in Sacramento Valley and San Joaquin Basin.The San Joaquin Basin and Sacramento Valley had the highest positive coefficient, indicating its contribution to the groundwater nitrate.The GED method also revealed high PW N>5 in Sacramento Valley and San Joaquin Basin.
The simultaneous observation of three different results provides strong confidence in the findings of these methods, either by complimenting each other or reconfirming the findings, which gives more confidence to the researcher about the contamination process in the CV.In addition, the findings of the interaction detector in the GED method point to the increased non-linear enhancement of precipitation with permeability (0.46), elevation (0.41), dissolved oxygen (0.43), fertilizer (0.38), manure (0.39) and clay (0.36), and confirms that Tulare Basin and San Joaquin Basin, as low rainfall areas, interact strongly with other variables in enhancing the groundwater nitrate contamination in the CV.However, careful analysis is required when analyzing data using different methods, as each method might have its own limitations.The GED method has a drawback of producing different results when the interval of explanatory variables changes, which defines a different geographical area.This study used natural break classification as the optimum classification method based on [57].The PCA method also requires several assumptions like continuous variables, linear relationship between variables, sampling adequacy, correlation between variables and removal of significant outliers before the test is performed.Detail knowledge about the variables is also required to be able to explain the output, as redundant variables are grouped into principal components.The correlation matrix in PCA aids in the understanding of relationships between explanatory variables.The GWR method also has the limitation of multicollinearity and kernel bandwidth selection.The regression residuals need to be analyzed carefully to make sure that there is no spatial autocorrelation to prevent model underand over-prediction.The coefficient map should be carefully analyzed to observe its influence on dependent variables over the entire spatial region.
The results of this study are consistent with previous findings of groundwater nitrate contamination in the CV.The San Joaquin and Tulare Basin have been identified as regions vulnerable to groundwater nitrate contamination in the CV due to excess fertilizer and other hydrogeologic conditions.Recent studies applying machine learning methods such as boosted regression tree, artificial neural network and Bayesian network to nitrate data for shallow wells also predicted higher nitrate concentrations in Tulare Basin and San Joaquin Valley.Fertilizer, cropland and landuse data have been collected around sampled wells by establishing a buffer region around them in many studies [9,28].Here, we analyzed the variables at the watershed level to closely represent the natural boundary and used PW N>5 to understand the overall status of the watershed.This study provides a consistent outlook on the significant explanatory variables as well as how they vary over the CV, the interaction between the variables and data redundancy.These are added advantages of this study compared to previous studies, which used only non-parametric tests to compare average values of nitrate across different landuse types, performed temporal analysis comparing the decadal changes in nitrate concentration or analyzed the data above a threshold level.Statistical methods are widely used nowadays to analyze groundwater nitrate contamination data over a large spatial scale; however, it is still a challenge to find long-term and spatially well-covered data.Consistent temporal and spatial coverage of data is essential to any statistical analysis to remove bias in the study.Regular long-term monitoring of wells and data consistency are required to ensure effective statistical inference.
Overall, this comparative study showed that the GED method with its four different detectors was able to capture all the information revealed by the PCA and GWR methods.In addition, the GED method has the following advantages over the PCA and GWR methods.(1) It does not make assumptions about the data, in contrast to PCA or GWR, leading to easier data preparation and wider applicability; (2) It not only reveals which factors are more important but also which areas of each variable are more vulnerable; (3) It can examine the interaction between different factors; (4) It works well with both categorical and continuous data; (5) It is easier to interpret the contribution of individual variables and is thus more useful for policy makers.

Conclusions
This study is focused on the comparison of three different statistical methods: GED, PCA and GWR to study the groundwater nitrate contamination in the CV.The study was conducted at the watershed level in the CV by examining the relationship between nitrate contamination (as measured by percent of wells above a background concentration) in each watershed and twelve explanatory variables.The GED method was very effective in determining precipitation, fertilizer, elevation, manure and clay as the significant variables.A higher percent of contaminated wells was observed in areas where percent of cropland, fertilizer and manure were high.San Joaquin and Tulare Basin have a higher number of wells contaminated with nitrate even though annual precipitation is relatively low.This can be attributed to the higher fertilizer application rate in agricultural land and to changing hydrologic conditions over the years from groundwater pumping that could enhance the downward percolation of nitrate into the aquifer.The PCA results grouped precipitation, manure, fertilizer and cropland with high loadings on PC-1, suggesting their similar construct in the data set and that could cause multicollinearity problem in analysis.The GWR method captured the spatial heterogeneity of precipitation, fertilizer and elevation with respect to PW N>5 .The local coefficients of precipitation were only slightly positive in San Joaquin Basin and Tulare Basin, and an increasing negative coefficient was observed towards northern Sacramento Valley.The coefficients of fertilizer were high around Tulare Basin where more fertilizer is applied.The coefficient of elevation was highest around Sacramento and San Joaquin Basin as these are relatively low elevation areas which are more susceptible to aquifer contamination.
Overall, the GED method was found to have added advantages over PCA and GWR as the results of GED are manifold compared to PCA and GWR.PCA is often used to reduce redundant data into fewer dimensions.However, PCA makes it difficult to interpret the grouped variables.The GWR model acts as a spatial microscope to reveal spatial changes of variables over the region but still suffers from multicollinearity that needs to be addressed using the OLS model.Although GWR can measure the changing coefficients over the study area, the GED method can directly compare different geographical areas based on the average value of the nitrate incidence rate.The less strict data requirement in the GED method in conjunction with its multiple outputs makes it an effective tool in understanding the overall nitrate contamination process in the Central Valley and beyond.

Figure 2 .
Figure 2. CV watersheds as the basic analysis unit.

Figure 2 .
Figure 2. CV watersheds as the basic analysis unit.

Figure 3 .
Figure 3. Well sampling locations for groundwater NO 3 concentration.

Figure 5 .
Figure 5. Illustration of the Geodetector method and basic analysis unit (watershed): (a) basic analysis unit watershed; (b) sub-regions of the study area based on one factor (rainfall); (c) overlay of the basic analysis unit and sub-regions for Geodetector analysis.

Figure 5 .
Figure 5. Illustration of the Geodetector method and basic analysis unit (watershed): (a) basic analysis unit watershed; (b) sub-regions of the study area based on one factor (rainfall); (c) overlay of the basic analysis unit and sub-regions for Geodetector analysis.

Figure 6 .
Figure 6.Distribution of rainfall in CV.

Figure 6 .
Figure 6.Distribution of rainfall in CV.

Figure 7 .
Figure 7. Distribution of fertilizer loading in the CV.

Figure 7 .
Figure 7. Distribution of fertilizer loading in the CV.

Figure 7 .
Figure 7. Distribution of fertilizer loading in the CV.

Figure 8 .
Figure 8. Distribution of manure loading in the CV.Figure 8. Distribution of manure loading in the CV.

Figure 9 .
Figure 9. Percent of cropland in the CV.Figure 9. Percent of cropland in the CV.

Figure 9 .Figure 9 .
Figure 9. Percent of cropland in the CV.Figure 9. Percent of cropland in the CV.

Figure 10 .
Figure 10.Elevation in the CV.

Figure 11 .
Figure 11.Percent of clay in the CV.

Figure 12 .
Figure 12.Distribution of permeability in the CV.

Figure 13 .
Figure 13.Distribution of dissolved oxygen in the CV.

Figure 13 .
Figure 13.Distribution of dissolved oxygen in the CV.

Figure 13 .
Figure 13.Distribution of dissolved oxygen in the CV.

Figure 15 .
Figure 15.Standardized residuals of the GWR model.

Figure 15 .
Figure 15.Standardized residuals of the GWR model.

Figure 15 .
Figure 15.Standardized residuals of the GWR model.

Figure 16 .
Figure 16.Coefficients of precipitation in the CV.

Figure 16 .
Figure 16.Coefficients of precipitation in the CV.

25 Figure 17 .Figure 17 .
Figure 17.Coefficient of fertilizer in the CV Figure 17.Coefficient of fertilizer in the CV.

Figure 17 .Figure 18 .
Figure 17.Coefficient of fertilizer in the CV

Figure 18 .
Figure 18.Coefficient of elevation in the CV.

Table 1 .
Explanatory variables and data sources.

Table 1 .
Explanatory variables and data sources.

Table 2 .
PD values for explanatory variables.

Table 2 .
PD values for explanatory variables.

Table 3 .
KMO and Bartlett's Test for PCA analysis.

Table 4 .
Correlation Matrix for explanatory variables.
** Correlation is significant at the 0.01 level.*Correlation is significant at the 0.05 level.(SeeTable2forfull variable names).

Table 5 .
Total variance explained using the PCA method.
See Table2for full variable names.

Table 7 .
Comparison between OLS and the GWR model.