Accounting for and Predicting the Influence of Spatial Autocorrelation in Water Quality Modeling

Several studies in the hydrology field have reported differences in outcomes between models in which spatial autocorrelation (SAC) is accounted for and those in which SAC is not. However, the capacity to predict the magnitude of such differences is still ambiguous. In this study, we hypothesized that SAC, inherently possessed by a response variable, influences spatial modeling outcomes. We selected ten watersheds in the USA and analyzed if water quality variables with higher Moran’s I values undergo greater increases in the coefficient of determination (R2) and greater decreases in residual SAC (rSAC). We compared non-spatial ordinary least squares to two spatial regression approaches, namely, spatial lag and error models. The predictors were the principal components of topographic, land cover, and soil group variables. The results revealed that water quality variables with higher inherent SAC showed more substantial increases in R2 and decreases in rSAC after performing spatial regressions. In this study, we found a generally linear relationship between the spatial model outcomes (R2 and rSAC) and the degree of SAC in each water quality variable. We suggest that the inherent level of SAC in response variables can predict improvements in models before spatial regression is performed.


Introduction
Water is an element crucial for life on Earth and is closely linked to the well-being of societies as well as the sustainability of aquatic ecosystems.A combination of natural and anthropogenic factors can adversely impact water quality.Human impacts involve general land use practices (e.g., agriculture, irrigation practices, urbanization, and deforestation), while natural factors include slope, elevation, vegetation cover, soil type, precipitation, and streamflow [1][2][3].River characteristics are generally dependent upon land use and geomorphological features of the watershed under study.In addition, water use patterns associated with the location of a region and its interactions with neighboring regions influence the quality of water bodies [4].These factors are responsible for the spatial variability of water quality, and are often treated as predictor variables in many hydrologic models [5].To provide better insights to future watershed management policies, understanding spatial processes associated with water quality variables is of extreme importance.
Space serves a vital role in structuring hydrological systems.Spatial autocorrelation (SAC) is an inherent property of spatial features such as streams and rivers [6].Legendre defined the concept of SAC as "the property of random variables taking values, at pairs of locations a certain distance apart, that are more similar (positive autocorrelation), or less similar (negative autocorrelation) than expected for randomly associated pairs of observations" (p.1659) [7].For example, causes of positive autocorrelation in stream water quality could be associated with similarities in local habitats or turbulent mixing and water chemistries of stream flows.In contrast, specific local built structures, such as beaver dams, fallen trees in stream channels, and territorial fishes, could be causes of negative SAC [8].Given these interactions over space (i.e., water flow from upstream to downstream areas, local biota, and water use patterns), it is necessary to consider the presence and potential effects of SAC in water quality modeling.
Several water quality studies have compared outcomes between spatial and non-spatial regressions [2][3][4]10,11,[27][28][29].In general, spatial models presented significant increases in R 2 values and decreases in residual SAC (rSAC), indicating that spatial model performance exhibited clear improvements over the non-spatial approach.However, as per the literature on hydrological modeling, it is still uncertain when such improvements become large or small.Assuming that each water quality variable presents a unique degree of inherent SAC, we hypothesize that this SAC (possessed by a response variable; i.e., a water quality variable) influences the outcomes of spatial modeling.We test if water quality variables with a higher amount of SAC would exhibit greater improvement in model outcomes than those with a lower amount of SAC (see Figure 1).We evaluate this hypothesis across divergent regions of the USA to enable a general understanding of the effect of SAC possessed by water quality variables. W examine if SAC is a consistent determinant of the magnitude of model improvements even when watershed characteristics diverge.If this is indeed the case, we can potentially determine the degree of improvement in model fit before performing a spatial regression simply by measuring the inherent SAC level of a water quality variable.This study can also serve as a useful screening technique where modelers could use Moran's I to predict the spatial pattern in the independent variable using a spatially explicit method.
ISPRS Int.J. Geo-Inf.2018, 7, 64 2 of 24 turbulent mixing and water chemistries of stream flows.In contrast, specific local built structures, such as beaver dams, fallen trees in stream channels, and territorial fishes, could be causes of negative SAC [8].Given these interactions over space (i.e., water flow from upstream to downstream areas, local biota, and water use patterns), it is necessary to consider the presence and potential effects of SAC in water quality modeling.Numerous studies in ecology, geography, and hydrology have noted the importance of accounting for SAC [9][10][11][12].These studies show that ignoring SAC can bias model outcomes and parameter estimates, leading to poor statistical inference and violation of the independence assumption of conventional regression approaches [8,[13][14][15][16].For example, models that ignore spatial effects (e.g., ordinary least squares; OLS) are likely to produce autocorrelated residuals violating the independent errors assumption.This can inflate the Type I error rate, wrongfully rejecting a null hypothesis.Many spatial approaches have been developed in order to overcome such limitations of non-spatial counterparts.These approaches include, but are not restricted to, regression kriging, simultaneous autoregressive modeling, conditional autoregressive modeling, spatial lag modeling, spatial error modeling, spatial eigenvector mapping, and geographically weighted regression [8,9,[17][18][19][20][21][22][23][24][25][26].
Several water quality studies have compared outcomes between spatial and non-spatial regressions [2][3][4]10,11,[27][28][29].In general, spatial models presented significant increases in R² values and decreases in residual SAC (rSAC), indicating that spatial model performance exhibited clear improvements over the non-spatial approach.However, as per the literature on hydrological modeling, it is still uncertain when such improvements become large or small.Assuming that each water quality variable presents a unique degree of inherent SAC, we hypothesize that this SAC (possessed by a response variable; i.e., a water quality variable) influences the outcomes of spatial modeling.We test if water quality variables with a higher amount of SAC would exhibit greater improvement in model outcomes than those with a lower amount of SAC (see Figure 1).We evaluate this hypothesis across divergent regions of the USA to enable a general understanding of the effect of SAC possessed by water quality variables.We examine if SAC is a consistent determinant of the magnitude of model improvements even when watershed characteristics diverge.If this is indeed the case, we can potentially determine the degree of improvement in model fit before performing a spatial regression simply by measuring the inherent SAC level of a water quality variable.This study can also serve as a useful screening technique where modelers could use Moran's I to predict the spatial pattern in the independent variable using a spatially explicit method.

Study Areas
The study areas are basins located in 10 states of the USA.The locations vary from east to west of the country.We analyzed water quality parameters in watershed and sub-watershed segments in Arizona (AZ), California (CA), Colorado (CO), Delaware (DE), Idaho (ID), Iowa (IA), Kansas (KS), Kentucky (KY), Louisiana (LA), and Virginia (VA).The basins were delineated by the U.S. Geological Survey (USGS), which states that as per the fifth and sixth levels of classification, these basins are smaller scale hydrologic units.Overall, their areas ranged from 150 km 2 to 764 km 2 .The climate and geology of the regions vary significantly due to their differences in latitude, longitude, and altitude.Tables 1 and 2 briefly present the climatological and geological characteristics of each state, and specific site characteristics in terms of area and water quality parameters, respectively.Figure 2 illustrates the watershed shapes and their land cover characteristics.

Dependent Variables
Water quality data from 2011 to 2017 were obtained online from the national Water Quality Portal (WQP) [30].The WQP integrates publicly available water quality data from three very important and widely used sources for research in the US: the USGS National Water Information System (NWIS), the EPA STOrage and RETrieval (STORET) Data Warehouse, and the United States Department of Agriculture (USDA) Sustaining the Earth's Watersheds Agricultural Research Data System (STEWARDS) through the Water Quality eXchange (WQX).
Based on the data availability and site locations, 29-54 sampling stations were selected from each study watershed (Table 2).Accounting for temporal variability in each watershed, the data were selected within the same week, month, or season.Therefore, no seasonality effect was considered in this study.Because we collected water quality data from different sources as explained above, the number and type of variables varied across watersheds (Table 2).These water quality variables were treated as dependent variables in this research.

Delineation of Upstream Area
Characteristics of the sub-watershed area upstream of sampling stations affect water quality variables [11].Thus, sub-watershed boundaries were delimited using the 'ArcHydro' package tool of ArcGIS 10.3 (Environmental Systems Research Institute, Redlands, CA, USA).We downloaded spatial stream data from the 2016 US Geological Survey (USGS) National Hydrography Dataset [31].The distance between stations varied, as did the size of each upstream area delineated.Land use characteristics as well as topography and soil far from the stream channel might contribute less to changes in water quality across space [3].Thus, we used the upstream area to separate the stream network specific to each station, and delineated the riparian zone around the stream.Many studies have conducted analyses at the riparian area scale, mainly by considering a buffer area on each side of the stream.Overall, there was no specific buffering distance recommended [3,11,32].In this study, we used a buffer zone of 50 m each side of the stream (i.e., a 100 m buffer in total) as the area that can contribute the maximum to water quality changes (Figure 3).We performed these analyses for all watersheds in this study.
changes in water quality across space [3].Thus, we used the upstream area to separate the stream network specific to each station, and delineated the riparian zone around the stream.Many studies have conducted analyses at the riparian area scale, mainly by considering a buffer area on each side of the stream.Overall, there was no specific buffering distance recommended [3,11,32].In this study, we used a buffer zone of 50 m each side of the stream (i.e., a 100 m buffer in total) as the area that can contribute the maximum to water quality changes (Figure 3).We performed these analyses for all watersheds in this study.

Independent Variables
Using the buffer zones of the upstream area, we extracted the land use, topography, and soil types associated with each sampling station.These variables were treated as independent variables in the subsequent modeling.The summary of these variables is shown in Table 3.We downloaded the land use raster with 30 m resolution from USGS The National Map-2011 National Land Cover Database (USGS TNM-NLCD) [33].In this study, we considered the percentage of four major land use types surrounding stream networks: urban, agriculture, forest, and wetland.To extract this information, we used the 'Zonal Statistics' toolset in ArcGIS 10.3.The percentage of urban area in each upstream buffer zone was calculated using the sum of the low-, medium-, and high-intensity urbanization, and open space values in the land use raster.The sum of values for pasture and cultivated crop was used to calculate the percentage of agricultural land in the area.The values for deciduous forest, evergreen forest, and mixed forest were used to arrive at the percentage of forest, while the values for woody wetlands and emergent herbaceous wetland were combined to calculate wetland percentage.For the topographic variables, we used 10 m resolution digital elevation models (DEMs) downloaded from USGS the National Map Elevation Products (USGS TNM 3DEP) [34].Using the same upstream area and zonal statistic toolset, we extracted the mean and standard deviation of the elevation and slope respectively for each station's upstream area.These variables were used to account for topographic complexity.
We downloaded the hydrological soil groups (HSGs) from the Natural Resources Conservation Service's (2017 NRCS) Soil Survey Geographic (SSURGO) database [35].We extracted the percentages of A, B, C, D, A/D, B/D, and C/D categories of soil for each site.The HSGs are categorized by the hydraulic conductivity level of a soil and how much runoff it produces.This is usually associated with the percentage of sediment grain sizes a soil presents.Typically, group A soils have a low runoff capacity because the water transmissivity through the soil profile is very high.Thus, group A soils are composed of a high percentage of sediments with large grain size, such as sand or gravel.Group B soils have a moderate runoff capacity.Nevertheless, water flows freely through the soil profile and the percentage of large-sized grains is high.In this case, however, small grain size sediments such as clay can reach up to 20 percent of the total.Group C soils have a moderately high runoff capacity and have a higher clay percent, with less than 50 percent of sand.Group D soils are characterized as having the highest percentage of fine grains such as clay and silt.The dual HSGs (A/D, B/D, and C/D) are wet soils where the water table is within 60 cm below the surface but can still be drained adequately.The first letter indicates the drained condition, and the second, the undrained [36].

Data Preprocessing
We tested the normality of each dependent and independent variable using IBM SPSS Statistics for Windows Version 23.0 (Armonk, NY, USA).In this study, the independent variables are likely to present a high level of correlations due to their nature.For example, agriculture and urban zones are land use types that might express a negative relationship because, as the area under agricultural use increases, the urbanized areas will tend to decrease.Thus, to account for the multicollinearity in the subsequent modeling, we applied principal component analysis (PCA), a multivariate technique.This technique reduces the dimensionality of a multivariate dataset where variables are significantly interrelated.This reduction results in principal components (PCs), which are considered uncorrelated variables [37,38].PCA is useful because it simplifies the description of the independent variables and the modeling procedure.We divided the independent variables into three main groups: land use, topography, and soil.Land use considered the percentage of urban, agriculture, wetland, and forest areas.The topographic group encompassed the mean and standard deviation values of slope and elevation.The soil groups represented the percentage of A, B, C, D, A/D, B/D, and C/D soil types (Table 3).Overall, we had three main PC groups used as the predictors in the models.Each variable category presents one to three PCs, depending on how significant the variables in the group are to the area of study.This means that a model can have three to nine principal components as independent variables.

Testing for Spatial Autocorrelation (SAC)
We quantified the inherent degree of SAC for each water quality parameter using Moran's I function (Equation ( 1)): where, X i and X j refer to the water quality at station i and station j, respectively.X is the overall mean water quality, and W ij is the weight matrix.Moran's I values vary between −1 to 1 for maximum negative and positive autocorrelation, respectively.No-zero values of Moran's I indicate that values at a certain geographical Euclidian distance are more similar (positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly assigned values [39,40].We used the geographic coordinate system based on angular values (longitude and latitude) considering the North American 1983 as the datum for the distance calculation.We acknowledge that we did not perform projection in this study, which would have been a serious issue if we were concerned with region-scale modeling crossing multiple states.Instead, the current study examined the water quality of several stations within local watersheds (<ca.764 km 2 ).Therefore, using the Euclidean distance should not be a critical problem.

Statistical Models
GeoDa version 1.8 (Chicago, IL, USA) was used to run three models in this paper.First, OLS, representing the non-spatial model, is a multiple linear regression approach (Equation (2)), where the response variable is the water quality parameter and the independent variables are the PCs of the topographic, land cover, and soil groups: where Y i is the response variable, β o is the constant in a linear model, β i are coefficients associated with the independent variables, and ε i is the error term.Notably, the same independent and dependent variables were used as in the spatial modeling approaches.The second model was a spatial lag model (Equation ( 3)): where Y i and Y j are the dependent variables at locations i and j, respectively, X i is the independent variable at i, β i is the regression coefficient, ρ is the spatial autoregressive coefficient, WY j is the spatially lagged dependent variable, and ε is the error term.This model accounts for the fact that the dependent variable is affected by the independent variables in adjacent places, and, thus, the dependent variable is spatially lagged as a predictor.The third model used was the spatial error model (Equation ( 4)): where Y i is the dependent variable at location i, X i is the independent variable, β i is the regression coefficient, ε is the error term, ň is the autoregressive coefficient, Wε is the spatially lagged error term, and ε is the homoscedastic and independent error term.This model accounts for the error terms that are correlated across different spatial units.

Model Comparison
After measuring the inherent degree of SAC for each water quality variable, we compared the outcomes of non-spatial OLS and spatial regression approaches in terms of R 2 and rSAC.To quantify rSAC, we estimated Moran's I for residuals.After the modeling procedure, we evaluated our hypothesis by plotting Moran's I values of the water quality variables against the R 2 and rSAC values for each water quality variable (Figure 4).A few water quality variables presented negative inherent SAC values and were treated as positive in this graph.This is because we intended to concentrate on the magnitude of SAC.where Yi and Yj are the dependent variables at locations i and j, respectively, Xi is the independent variable at i,  is the regression coefficient,  is the spatial autoregressive coefficient, WYj is the spatially lagged dependent variable, and  is the error term.This model accounts for the fact that the dependent variable is affected by the independent variables in adjacent places, and, thus, the dependent variable is spatially lagged as a predictor.The third model used was the spatial error model (Equation ( 4)): where Yi is the dependent variable at location i, Xi is the independent variable,  is the regression coefficient, ε is the error term, ƛ is the autoregressive coefficient, ε is the spatially lagged error term, and ε is the homoscedastic and independent error term.This model accounts for the error terms that are correlated across different spatial units.

Model Comparison
After measuring the inherent degree of SAC for each water quality variable, we compared the outcomes of non-spatial OLS and spatial regression approaches in terms of R 2 and rSAC.To quantify rSAC, we estimated Moran's I for residuals.After the modeling procedure, we evaluated our hypothesis by plotting Moran's I values of the water quality variables against the R 2 and rSAC values for each water quality variable (Figure 4).A few water quality variables presented negative inherent SAC values and were treated as positive in this graph.This is because we intended to concentrate on the magnitude of SAC.

Figure 4. Evaluation of the hypothesis-Moran's I values of the water quality variables appear on the
x-axis, and the model outcomes, R 2 and residual SAC, appear on the y-axis.After spatial regression, water quality variables with a higher amount of spatial autocorrelation (SAC) were hypothesized to exhibit improved hydrologic modeling (i.e., more increases in R 2 and more decreases in residual SAC) than those with lower SAC.

Changes in R² Values
Overall, Moran's I values pertaining to water quality variables varied widely, from 0.01 to 0.82, across all watershed sites (Figure 5).The relationships shown in Figure 5 indicate that the improvements in R² were proportional to the degree of inherent SAC in water quality variables (i.e., the hypothesis predicting increases in R² as a function of the degree of SAC is supported).Whether we treated each state separately or combined them as a whole, strongly autocorrelated water quality parameters over space (i.e., having higher Moran's I values) exhibited greater increases in R² values x-axis, and the model outcomes, R 2 and residual SAC, appear on the y-axis.After spatial regression, water quality variables with a higher amount of spatial autocorrelation (SAC) were hypothesized to exhibit improved hydrologic modeling (i.e., more increases in R 2 and more decreases in residual SAC) than those with lower SAC.

Changes in R 2 Values
Overall, Moran's I values pertaining to water quality variables varied widely, from 0.01 to 0.82, across all watershed sites (Figure 5).The relationships shown in Figure 5 indicate that the improvements in R 2 were proportional to the degree of inherent SAC in water quality variables (i.e., the hypothesis predicting increases in R 2 as a function of the degree of SAC is supported).Whether we treated each state separately or combined them as a whole, strongly autocorrelated water quality parameters over space (i.e., having higher Moran's I values) exhibited greater increases in R 2 values after spatial regression compared to weakly autocorrelated variables (i.e., having lower Moran's I values).This pattern seemed to be less clear when water quality variables within a state had a relatively narrow range of Moran's I (e.g., Delaware).

Changes in rSAC
The values of Moran's I indicating rSAC produced by non-spatial OLS presented a wider range than those from spatial regression (i.e., rSAC for non-spatial OLS from 0.01 to 0.72, while spatial lag rSAC ranged from 0.00 to 0.44, and spatial error, from 0.00 to 0.07).We found a positive correlation between the degree of SAC in water quality variables and rSAC from non-spatial OLS.Conversely, as expected, rSAC values acquired by spatial regressions were in general near zero.Therefore, the larger the Moran's I values possessed by water quality variables, the greater the reduction in rSAC after running models that consider spatial dependence (Figure 6; the hypothesis predicting greater decreases in rSAC, proportional to the degree of SAC in water quality variables, is supported).All states presented significant reduction in rSAC except Delaware, showing a narrow range of Moran's I values of water quality variables.

Changes in rSAC
The values of Moran's I indicating rSAC produced by non-spatial OLS presented a wider range than those from spatial regression (i.e., rSAC for non-spatial OLS from 0.01 to 0.72, while spatial lag rSAC ranged from 0.00 to 0.44, and spatial error, from 0.00 to 0.07).We found a positive correlation between the degree of SAC in water quality variables and rSAC from non-spatial OLS.Conversely, as expected, rSAC values acquired by spatial regressions were in general near zero.Therefore, the larger the Moran's I values possessed by water quality variables, the greater the reduction in rSAC after running models that consider spatial dependence (Figure 6; the hypothesis predicting greater decreases in rSAC, proportional to the degree of SAC in water quality variables, is supported).All states presented significant reduction in rSAC except Delaware, showing a narrow range of Moran's I values of water quality variables.

Overall Changes between Non-Spatial OLS and Spatial Regression Models
In general, the improvement in R² and reduction in rSAC after spatial regression were positive, and the changes of R² and rSAC showed to be linearly a function of the degree of SAC possessed by water quality variables.We found this relationship in each study area, and the results were summarized in Table 4.

Overall Changes between Non-Spatial OLS and Spatial Regression Models
In general, the improvement in R 2 and reduction in rSAC after spatial regression were positive, and the changes of R 2 and rSAC showed to be linearly a function of the degree of SAC possessed by water quality variables.We found this relationship in each study area, and the results were summarized in Table 4.

Summary of Findings
Overall, in this study, we found that the magnitude of model improvement (i.e., increases in R 2 and decreases in rSAC), after both spatial lag and error modeling, is significantly and linearly a function of the SAC inherently possessed by water quality variables (i.e., response variables) (Figure 7).This, in turn, supported our hypothesis that water quality variables with a higher amount of SAC would exhibit greater improvement in model outcomes than those with a lower amount of SAC.

Summary of Findings
Overall, in this study, we found that the magnitude of model improvement (i.e., increases in R 2 and decreases in rSAC), after both spatial lag and error modeling, is significantly and linearly a function of the SAC inherently possessed by water quality variables (i.e., response variables) (Figure 7).This, in turn, supported our hypothesis that water quality variables with a higher amount of SAC would exhibit greater improvement in model outcomes than those with a lower amount of SAC.

Discussion
The results support our hypothesis and offer insights into the field of water quality modeling.Most importantly, the level of SAC in water quality variables has the potential to indicate how much improvement a non-spatial model would experience if SAC was appropriately considered (i.e., increases in R² values and decreases in rSAC).We have demonstrated across divergent watersheds in the USA that the higher the SAC in a water quality variable, the greater the improvements to the model after accounting for SAC.Water quality studies, as previously mentioned, presented better results when considering spatial modeling approaches that account for SAC [2][3][4][5].However, these studies have not considered the magnitude of SAC in the response variable as the main driver of model improvements.Furthermore, we observed that variables with lower degree of inherent SAC (lower Moran's I values) underwent smaller changes in model outcomes compared to those that presented larger Moran's I values.In this sense, higher Moran's I values imply more spatial organization (e.g., strong connection among water quality stations through the stream network) than smaller Moran's I values.This indicates that the need for (and potentially the benefit from) accounting for SAC in water quality modeling increases as the degree of SAC increases.

Discussion
The results support our hypothesis and offer insights into the field of water quality modeling.Most importantly, the level of SAC in water quality variables has the potential to indicate how much improvement a non-spatial model would experience if SAC was appropriately considered (i.e., increases in R 2 values and decreases in rSAC).We have demonstrated across divergent watersheds in the USA that the higher the SAC in a water quality variable, the greater the improvements to the model after accounting for SAC.Water quality studies, as previously mentioned, presented better results when considering spatial modeling approaches that account for SAC [2][3][4][5].However, these studies have not considered the magnitude of SAC in the response variable as the main driver of model improvements.Furthermore, we observed that variables with lower degree of inherent SAC (lower Moran's I values) underwent smaller changes in model outcomes compared to those that presented larger Moran's I values.In this sense, higher Moran's I values imply more spatial organization (e.g., strong connection among water quality stations through the stream network) than smaller Moran's I values.This indicates that the need for (and potentially the benefit from) accounting for SAC in water quality modeling increases as the degree of SAC increases.
In this study, we investigated water quality variables from 10 watersheds, each distinct in geology, land use, soil, and topography.We analyzed a total of 93 water quality variables that also differed among the watersheds.Despite such strong peculiarities among the watersheds, this study reveals a consistent and linear relationship between the SAC of water quality parameters and changes in the model outcomes (R 2 and rSAC).This finding perfectly accords with the study of Kim et al. [13] who evaluated the effect of SAC in soil-landform modeling to find that the degree of SAC in soil variables (i.e., dependent variables) influenced model improvements after the SAC was properly accounted for.
Our findings suggest that future water quality modeling studies should account for SAC to improve the performance of non-spatial approaches, principally when the predictors in the model cannot sufficiently account for all SAC in the model [6,7,9,12,13,15].Overall, the improvements include increasing R 2 and decreasing rSAC.The most important point is that the degrees of these increases and decreases showed to be linearly correlated with the level of SAC in water quality variables.Therefore, water quality studies should not only focus on accounting for spatial autocorrelation, but also on understanding the magnitude of SAC inherent in water quality variables.Doing so, we could point out the degree of connectivity within water quality variables, as well as the improvement in model outcomes of a non-spatial approach before performing a spatial regression.
Adequate information on the degree of hydrologic connectivity among water quality variables is needed in watershed management and policy decisions [41,42].The level of SAC inherent in a variable can allow managers to reveal the complex spatial relationship of water quality as well as its changes from up to downstream.It can uncover dissimilarity patterns among water quality parameters throughout the stream network in study and help in the implementation of policies that are ecologically beneficial to the aquatic ecosystem.Therefore, we highlight that the investigation of SAC in water quality modeling is not only beneficial in the model results, but also in the process of watershed management.
Streams can be considered spatially structured ecological networks, where patterns are usually associated with the in-stream flow and habitat, or even the physical structure of the network.The understanding of these patterns can be limited when only using Euclidean distance [43].For example, two sites that are near to each other can be considered neighbors due to distance in the Euclidean technique, but they can present distinct water quality measures simply due to the water quality origins from vastly different drainage areas.Therefore, we highlight that this is a limitation in this study and further studies should focus on applying spatial network distance techniques to better understand the SAC influence.

Conclusions
Spatial autocorrelation (SAC) is a property possessed by any ecological or environmental variable.Consequently, its incorporation and impacts on modeling results have been studied in much detail in a variety of scientific fields.Our study demonstrates that analyzing SAC in water quality modeling provides benefits beyond just improvements in model outcomes (R 2 and rSAC): it can potentially lead to a better understanding of the extent of spatial organization of water quality variables, as well as serve as a useful screening technique to anticipate the predictability of the spatial pattern in the independent variable used in a spatially explicit model.We also highlight the benefits of understanding the level of SAC possessed by a water quality variable in the process of watershed management and point that network distances techniques could better account for the spatial pattern existent in spatially structured ecological networks such as streams.

Figure 1 .
Figure 1.Conceptualization of the main ideas of the study (PC, principal components; OLS, ordinary least squares; SAC, spatial autocorrelation).

Figure 1 .
Figure 1.Conceptualization of the main ideas of the study (PC, principal components; OLS, ordinary least squares; SAC, spatial autocorrelation).

Figure 3 .Figure 3 .
Figure 3. Upstream area delineation and their respective buffer zones in tones of gray.The solid circles are water quality stations (sites).

Figure 4 .
Figure 4. Evaluation of the hypothesis-Moran's I values of the water quality variables appear on the x-axis, and the model outcomes, R 2 and residual SAC, appear on the y-axis.After spatial regression, water quality variables with a higher amount of spatial autocorrelation (SAC) were hypothesized to exhibit improved hydrologic modeling (i.e., more increases in R 2 and more decreases in residual SAC) than those with lower SAC.
ISPRS Int.J. Geo-Inf.2018, 7, 64 12 of 24 after spatial regression compared to weakly autocorrelated variables (i.e., having lower Moran's I values).This pattern seemed to be less clear when water quality variables within a state had a relatively narrow range of Moran's I (e.g., Delaware).

Figure 5 .
Figure 5. Relationship between the spatial autocorrelation (SAC) of each water quality variable (represented by Moran's I values) and the R² indicating the amount of variance in each water quality variable, explained by topographic, land use, soil groups, and spatial terms.

Figure 5 .
Figure 5. Relationship between the spatial autocorrelation (SAC) of each water quality variable (represented by Moran's I values) and the R 2 indicating the amount of variance in each water quality variable, explained by topographic, land use, soil groups, and spatial terms.

Figure 6 .
Figure 6.Relationship between the spatial autocorrelation (SAC) of each water quality variable (represented by Moran's I values) and the SAC of model residuals (also represented by Moran's I values)."All states combined" showed a general reduction in residual SAC after accounting for spatial autocorrelation in the models of each water quality variable.

Figure 6 .
Figure 6.Relationship between the spatial autocorrelation (SAC) of each water quality variable (represented by Moran's I values) and the SAC of model residuals (also represented by Moran's I values)."All states combined" showed a general reduction in residual SAC after accounting for spatial autocorrelation in the models of each water quality variable.

Figure 7 .
Figure 7. Linear regression models demonstrating that the magnitude of improvement of model performance after spatial lag and error modeling is significantly and linearly explained by the SAC inherently possessed by water quality variables.The Moran's I (x-axis) and Change in R² (y-axis) values were transformed using square-root transformation, while the Change in rSAC (y-axis) were log-transformed.

Figure 7 .
Figure 7. Linear regression models demonstrating that the magnitude of improvement of model performance after spatial lag and error modeling is significantly and linearly explained by the SAC inherently possessed by water quality variables.The Moran's I (x-axis) and Change in R 2 (y-axis) values were transformed using square-root transformation, while the Change in rSAC (y-axis) were log-transformed.

Table 1 .
Description of the ten study sites investigated in this research.

Table 2 .
Study areas (10 watersheds each in one state of the USA and their areas), number of stations per study area, and water quality parameters (response variables) with the respective Moran's I values in parentheses.

Table 3 .
Data sources and details of dependent and independent variables.
Note: PC (Principal Component); WQP (Water Quality Portal); USGS (United States Geological Survey); USDA, NRCS (United States Department of Agriculture, Natural Resources Conservation Service).

Table 4 .
Summary of mean values of spatial autocorrelation (I) in response variables, mean values of the non-spatial OLS outcomes and mean improvement in R 2 and reduction rSAC after spatial regression per state.Additionally, the linear regression model coefficients, R 2 , and p-value of the Changes in R 2 and rSAC per state.significant at the 0.10 level.I: Moran's I values; OLS: ordinary least squares; rSAC: residual spatial autocorrelation; lag-ols: improvement in R 2 from non-spatial ols to spatial lag regression; error-ols: improvement in R 2 from non-spatial ols to spatial error regression; ols-lag: reduction in rSAC from non-spatial ols to spatial lag regression; ols-error: reduction in rSAC from non-spatial ols to spatial error regression. *