Multi-Scale Analysis of the Dependence of Water Quality on Land Use Using Linear and Mixed Models

: Land use inﬂuences water quality in streams at different spatial scales and varies in time and space. Water quality has long been associated with agricultural and urban land uses in catchments. The effects of developed, forest, pasture, and agricultural land use on nitrogen, nitrate, and nitrite (NNN); total phosphorus (TP); total suspended solids (TSS); chemical oxygen demand (COD); dissolved oxygen (DO) and total Kjeldahl nitrogen (TKN) concentrations and their sensitivity were quantiﬁed to spatial pattern differences. The linear mixed modeling framework was used to examine the importance of spatial extent on models with water quality parameters as the response variable and land use types as the predictor variable. The results indicated that land use categories on different water quality parameters were signiﬁcant and dependent on the selected spatial scales. Land use exhibited a strong association with total phosphorus and total suspended solids for close reach distances. Phosphorus is not highly soluble, and it binds strongly to ﬁne soil particles, which are transported by water via runoff. Nitrogen, nitrate, and nitrite, dissolved oxygen, chemical oxygen demand, and total Kjeldahl nitrogen concentrations were better predicted for further reach distances, such as 45 or 50 km, where the best model of nitrogen, nitrate, and nitrite is consistent with the high mobility of NO 3 − .


Introduction
The quality of water supply is usually related to land use within a catchment area as it determines the quality and quantity of runoff before and after a rainfall event [1]. Hence, most water pollution problems are caused by changes in the distribution of land use and its management within a catchment as human activities increase. The main factors behind the alteration of the hydrological system are changes in land cover and land management practices, causing changes in runoff as well as the quality of receiving water [2,3].
Watershed factors, such as land use, landscape, and physiographical characteristics, may have a direct impact on the quality of river water [4,5]. The shifts in urban land use and land cover influence spatial and temporal runoff trends, which in turn impacts surface water quality [6]. Changing land use and land management practices are therefore regarded as central factors in altering the hydrological system, causing changes in runoff and water quality [7].
Land use changes in agricultural, forest, marsh and developed areas also affect soil quality, nutrient fluxes, soil surface temperature and native species assemblies, and these changes can influence watershed hydrology by altering interception, drainage, evapotranspiration, and groundwater recharge levels, resulting in changes in timing and surface and river runoff amounts [8,9]. Land uses affect water quality, whether positive or negative; for example, in forests and other areas with different types and coverage of vegetative surfaces, most precipitation infiltrates into the soil rather than running off. As a result, stream flows are steady, and water quality is excellent. In developed zones with asphalt 1,900,000 m 3 of surface water and groundwater each day. About 1 million people on farms and in small towns use 650,000 m 3 each day, mostly groundwater [37]. Agricultural products that come from the basin include corn, soybeans, wheat, hay, vegetables, fruit, dairy products, livestock, and forest products. The hydraulic conditions of the Wabash River have been modified due to development within the basin, construction of lakes, and flood protection projects adjacent to the streams and rivers to protect the various communities.
The Wabash River has been steadily polluted since the settlement period, largely due to agricultural growth and increased human impacts. These past patterns appeared to be reversed in 1984, when an immediate and dramatic change was established in the middle west, the Wabash fishery, which continues [36,38]. In a study of riparian wetlands in Indiana streams, Gammon proposed that these improvements were due to slow but accumulated point-source decrease in the biological oxygen demand loading of the river [38]. Several negative impacts of agricultural activities persist, including silting, rapid drainage due to the tiling of fields, and fertilizer and pesticide inputs. Loss of wetlands and natural riparian zones has increased flooding, and deforestation has increased riverbank erosion, sediments, and contaminants from agricultural fields and urban areas have contributed to the Gulf of Mexico "Hypoxic Zone".
Regression models have been used to determine the relationship between land use and water quality parameters. A comprehensive study that investigated the relationship of nitrate concentrations in streams to row crop land use in Iowa found that the nitrate concentration decreases with watershed size [39]. In another study, ordinary least squares regression models were applied to examine the spatially varying relationship between landscape metrics and topography with water quality parameters [23]. Traditional regression models assume that "random variables" must be independently and equally distributed; however, these assumptions are not frequently satisfied in environmental studies [39].
Linear Mixed Models (LMM) are a natural extension of classical linear regression models to allow for the incorporation of random effects. LMM provide the ability to handle non-probability-based sampling schemes and spatial correlation structures between samples [40,41]. The linear mixed model has been used to estimate water quality using turbidity and discharge [41], water quality, and landscape characteristics at watershed scales [34,42].
The objective of this research is to examine the relationship between land-use and water quality for Wabash River watershed sub-basins. The scenarios explored are reach distance, buffer width and catchment using Linear Regression Modeling and LMM using as fixed effects the land use, and as Random Effects the Site, Year and Site*Year, addressing the following questions: 1. At what spatial scale does landscape heterogeneity act to influence water quality, and 2. How do spatial and temporal variation in land use within and across watersheds influence water quality pollutants in the basins inside of the Wabash River watershed. The model results can provide insight for categorizing watersheds and offer information that might assist in improving local land use and water quality management.

Study Area
The Wabash River and its tributaries generally exhibit moderate water quality; the Wabash River is the largest southward-flowing tributary of the Ohio River and has a watershed area of 85,340 km 2 [35]. Decades of consuming and developing land in the Wabash River Watershed have degraded the quality of its waters. By 1960, 48% of Indiana was subject to artificial drainage [43]. The topography ranges from 98.76 m (324 feet) above sea level in the southwest corner of the state at the mouth of the Wabash River to 383.14 m (1257 feet) in the extreme east-central portion of the state. The average monthly temperature in Indiana ranges from −1 • C in January to around 24 • C in July [43]. Precipitation varies from 92 to 112 cm of rainfall from north to south, and 41 to 187 cm of snow, and snowfall contributes 5-18 cm of average annual precipitation [44].

Watershed Characteristics
About two-thirds of the Wabash River watershed is devoted to agricultural cropland, and an additional 8.2% is in pasture or grassland. Forests or woodlands constitute only 13.5% of the basin's land area [36]. The study focuses on 17 sub-basins inside the Wabash Basin ( Figure 1). Percentage of developed land in a watershed was defined as the sum of NLCD (National Land Cover Database) low-, medium-, high-intensity, and open space land-cover classes. The deciduous forest as forest, pasture/hay as pasture, cultivated crops as agriculture, and rest as the sum of open water, barren, shrubland, and herbaceous ( Figure 2). The land cover percentages were calculated within watersheds. varies from 92 to 112 cm of rainfall from north to south, and 41 to 187 cm of snow, and snowfall contributes 5-18 cm of average annual precipitation [44].

Watershed Characteristics
About two-thirds of the Wabash River watershed is devoted to agricultural cropland, and an additional 8.2% is in pasture or grassland. Forests or woodlands constitute only 13.5% of the basin's land area [36]. The study focuses on 17 sub-basins inside the Wabash Basin ( Figure 1). Percentage of developed land in a watershed was defined as the sum of NLCD (National Land Cover Database) low-, medium-, high-intensity, and open space land-cover classes. The deciduous forest as forest, pasture/hay as pasture, cultivated crops as agriculture, and rest as the sum of open water, barren, shrubland, and herbaceous (Figure 2). The land cover percentages were calculated within watersheds. Cultivated crops account for 79.76% of land uses inside of the Wabash Basin mostly for corn and soybean, 8.88% are developed areas in general, 6.14% are forest, and pasture constitutes only 2.96% of the sub-basin average areas in 2001 (Table 1). The "Path" column in Table 1 shows the longest flow path within each sub-basin. The sub-basin areas of the study sites vary several orders in magnitude (85-1334 km 2 ). Land use composition changes across watersheds, such as developed land varies between 5.5 to 14.7%, forest 1.8 to 25.4%, pasture 0.0 to 8.5% and agriculture land 60.2 to 88.3% [45], with highest differences in agriculture land and the smallest in pasture land in 2001 (Table 1).   (Table 1). The "Path" column in Table 1 shows the longest flow path within each sub-basin. The sub-basin areas of the study sites vary several orders in magnitude (85-1334 km 2 ). Land use composition changes across watersheds, such as developed land varies between 5.5 to 14.7%, forest 1.8 to 25.4%, pasture 0.0 to 8.5% and agriculture land 60.2 to 88.3% [45], with highest differences in agriculture land and the smallest in pasture land in 2001 (Table 1).

Data
The Digital Elevation Model (DEM), expressed as a 30-m raster from USGS, was used to delineate upstream watersheds using water quality station locations. The National Hydrological Dataset Plus [46] was used to characterize the stream network. The NHD dataset was used to improve hydrologic boundary delineation [47]. Water quality data [48] from 2001 to 2016 were obtained from the United States Environmental Protection Agency's STORET (Table 1), and they were aggregated yearly using the arithmetic mean. Land cover percentages were summarized from the National Land Cover Database (NLCD), and a raster dataset developed from 30-m data from 2001, 2006, 2011, and 2016. After watershed delineation, the maps were overlaid with the NLCD in ArcGIS, and land-cover class percentages were extracted and calculated for each sub-watershed area for the years 2001 (Table 1), 2006, 2011 and 2016, and the land-cover percentages of the missing years were calculated using linear extrapolation.
Riparian buffers were defined as the width of n meters on each side of the stream center extended to the entire upstream above all water quality stations [12]. Reach distance areas were defined as the area from the quality station upstream using the flow-length concept, which was calculated using the Flow Length tool in ArcGIS to calculate the flow path to the outlet within the basin at 2, 5, 10, 25, 35, 40, 45, and 50-km constrained by watershed boundaries (Figure 2a-e). The flow length is calculated using the eightdirectional algorithm described by [49] O'Callaghan and Jenson [50] and implemented in the ARC/INFO GIS system (ESRI, Redlands, CA, USA). Riparian buffers were delineated using NHD stream information, and buffer analysis in ArcGIS was used to create areas at 200, 500, and 1000-m from streams (Figure 2f-h); watershed boundaries further constrained all buffer analyses.
The watershed WAW030_0022 is shown in Figure 3, and land uses for the reach distance at 25 km. Agriculture covers most of the reach distance at 25 km, and the remaining area of the figure shows the riparian buffer at 100 m. length concept, which was calculated using the Flow Length tool in ArcGIS to calculate the flow path to the outlet within the basin at 2, 5, 10,25,35,40,45, and 50-km constrained by watershed boundaries (Figure 2a-e). The flow length is calculated using the eight-directional algorithm described by [49] O'Callaghan and Jenson [50] and implemented in the ARC/INFO GIS system (ESRI, Redlands, CA, USA). Riparian buffers were delineated using NHD stream information, and buffer analysis in ArcGIS was used to create areas at 200, 500, and 1000-m from streams (Figure 2f-h); watershed boundaries further constrained all buffer analyses.
The watershed WAW030_0022 is shown in Figure 3, and land uses for the reach distance at 25 km. Agriculture covers most of the reach distance at 25 km, and the remaining area of the figure shows the riparian buffer at 100 m.  Table 2 shows the land use percentages for different reach distances in the year 2001 for watersheds WAW030-0022, WDE050-0022, and WSU020-0003. At water quality station WAW030-0022, forest area percentage decreases from a 2 km to 45 km reach distance; the 45 km reach distance has a similar forestry land use percentage value to that for the catchment. At water quality station WSU020-0003, forest decreases from a reach distance of 2 km to the catchment, and developed increases from the reach distance of 10 to 50 km (Table 2).   Table 2 shows the land use percentages for different reach distances in the year 2001 for watersheds WAW030-0022, WDE050-0022, and WSU020-0003. At water quality station WAW030-0022, forest area percentage decreases from a 2 km to 45 km reach distance; the 45 km reach distance has a similar forestry land use percentage value to that for the catchment. At water quality station WSU020-0003, forest decreases from a reach distance of 2 km to the catchment, and developed increases from the reach distance of 10 to 50 km (Table 2). Six water quality parameters (Table 3), between 2001 and 2016 were selected for the 17 study watersheds based on availability and importance to human and aquatic life. Chemical oxygen demand (COD) and dissolved oxygen (DO) were selected, as they are related to the estimation of the probability of algae bloom and habitat quality for fish and other aquatic species. Total solids (TS) were chosen as they can be used as a quantitative measure of aesthetics as suspended sediments in streams make the water appear cloudy [25]. Nitrogen, nitrate, and nitrite (NNN) and total phosphorus (TP) are generally considered direct measures of human activity in an area due to their presence in fertilizers, vehicle emissions, and impervious surfaces [25]. Total Kjeldahl Nitrogen (TKN) is the total concentration of organic nitrogen and ammonia. Table 3. Main characteristics of study drainage areas and their water quality [48].

Spatial Analysis
To determine the association between land use and water quality at each water quality station site, the riparian buffers and reach areas from the outlet were used to extract land use information from the land use raster (Figure 2). Using GIS tools, areas were determined for each land cover within each scenario (a riparian buffer area or reach distance area) divided by the entire covered area to derive the percentage of the total area covered by each riparian buffer or reach distance area. Land use map features for years 2001, 2006, 2011, and 2016 were extracted, cutting out data using the area boundaries.

Statistical Analysis
Average yearly water quality data were calculated for all the stations to reduce the impact of missing values. The normality of the dependent variables was checked using the one-sample Kolmogorov-Smirnov (KS) test. All water quality measurements were found to lack normality with 95% confidence. Consequently, the dependent variables were transformed using the Box-Cox technique to find adequate power transformation (or logarithmic) that produces a normal distribution with homoscedastic residuals.
The relationship between land use and water quality inside the Wabash River Watershed sub-basins was examined using multiple linear regression (MLR) and LMM. The MLR models were fitted to evaluate the relationship between a response (i.e., single water quality parameter) and predictors (i.e., land use percentages). The adjusted coefficient of multiple determination (adjusted R-squared) was used to measure the predictive power of the MLR model with a penalty for model complexity [22,51,52]. The standard partial regression coefficients (B) have been introduced to demonstrate the relative significance of the various response predictors, which are independent of the data units [22,53,54]. A predictor with a bigger B has a better correlation with the response. Separate stepwise MLR tests for each reach distance scale are determined for a given water quality parameter. The final MLR model with stepwise variable selection was selected based on specific criteria: (1) the model had the highest adjusted R 2 among all the reaches, buffers, and catchments; (2) the significance of the coefficients of the model and predictors are both at the 95% confidence level (p ≤ 0.05). Subsequently, the goodness-of-fit of the final MLR models was evaluated by scattering and simple linear regression analysis of observed versus predicted values [14].
A set of LMM was used to establish the relationship between land use proportions and water quality measures. The Linear Mixed Model is beneficial in spatial sciences as it accounts for the spatial autocorrelation between samples [41]. The spatial autocorrelation was used since there are multiples catchments, and there is a temporal autocorrelation because the data are between 2001 and 2016. To quantify the spatial variation in the relationships between individual land uses and water quality parameters, the linear mixed model was used. All analyses were conducted in R. For each water quality parameter, 192 univariate models were computed, with each water quality parameter as the response variable, and each of the four land uses types, at each of the eight spatial extents, as the predictor variable. Separate models were developed for each water quality parameter, where 4 land uses × 8 spatial extents = 32 models times 6 water quality parameters = 192 models. Because each of the water quality stations is nested within eight spatial extents, LMM is used, in which the rivers within regions could vary in mean water quality parameter concentrations (i.e., a random intercept) and/or the rivers within regions could vary in the response of river water quality parameters to differences in land uses (i.e., a random slope).
The analysis was conducted in two steps. Firstly, the amount of water quality parameter variation occurring among regions was quantified. If there was a significant (p < 0.05) amount of among scale variation, each area varies in its mean river water quality parameter concentration (i.e., a random intercept). Secondly, the potential of the effects of the land uses on river water quality parameters vary regionally and yearly. By creating and comparing univariate models, we avoided any problems with multicollinearity in the data (which are likely across spatial extents and land uses).
A random term accounts for the inherent spatial variation of observations belonging to the same location "Site," as well as a temporal-dependent random effect that is meant to account for extraordinary weather events for a given year "Year" and an additional random term that is the combination of year and site effect. Below, the parameter estimates for the fixed effects are presented as land use percentages, such as % Developed, % Forest, % Agriculture, and % Pasture. Formulating the predictive model for water quality parameters, the linear mixed-effects model in Equation (1) is.
where y is the response vector (water quality parameter), β represents all fixed effects; u the random effects, X design matrix for the fixed effects "land uses," Z design matrix for the random effects "site and year"; u and ε are independent. Random effects encompass variation among individuals [55]. The nlme package in R was used for the above statistical analysis. The adjusted R squared, Equation (2), was used to adjust the R-value to the number of explanatory terms in the model relative to the number of data points where p is the total number of explanatory variables in the model (not including the constant term) and n is the size of the sample. The adjusted R-squared values were used to compare the explanatory power of models that contain different numbers of data. The adjusted R-squared value can be negative, and it is always lower than the R-squared value.

Results and Discussion
Developed and pasture land show that there is not much difference in land proportion in reach distance greater than 25 km (Figure 4a,c). In contrast, for forest, there is a difference among reach distances for all reach distances (Figure 4b), and agriculture land has the most significant differences among reach distances (Figure 4d), varying from 0.1 to 89%. Agricultural land presents the smallest values at 2 and 10 km in some sites, where developed land or forest have a high percentage, indicating that the urban areas are closer to watershed outlets and near rivers. Developed land shows outliers at the 2 km reach distance; it shows that some of the sites are mainly developed land use at a 2 km reach distance. Forest land use shows a decreasing percentage from closer reach distances to further reach distances (Figure 4b). In contrast, agricultural land use shows an increasing proportion of land use from closer reach distances to further reach distances (Figure 4d). The concentrations of nutrients in areas of mixed land use were found lower than in agricultural or urban areas but higher than those in undeveloped areas [56]. The land use spatial distribution varies, influencing water quality. In order to determine at what distance the heterogeneity of the landscape best explains water quality parameters, the fixed effects (developed, forest, pasture, and agriculture land use) were calculated for reach distances varying from 2, 10, 25, 35, 40, 45, and 50-km; riparian buffers of 200, 500, and 1000-m and basin, and an LMM with site, year, and site and year as random effects were fitted to each of them.
The adjusted R-squared (R 2 ) shows that the explanatory power of heterogeneity of landscape has the best model of NNN at a reach distance of 45 km (Figure 5a). In contrast, the results across riparian buffers did not show a relationship between NNN and land use with R 2 < 0.03 ( Figure 6). The NNN was best explained by land use quantified within the entire watershed [57]. Comparing results for the reach distance of 45 km with those for the entire watershed (Figure 5a), the best model for the entire watershed has an R 2 of 0.41 for LMM with year as random effects, while for 45 km, it has a R 2 of 0.64, showing that the total NNN variance is best explained by the land uses at 45 km with year and site as random effects (Figure 5a). Figure 7 shows the parameter estimates of the best fitting model where fixed effects were calculated at 45 km and the corresponding p-values. Figure 4 shows the boxplot of the percentages of the land uses at different reach The concentrations of nutrients in areas of mixed land use were found lower than in agricultural or urban areas but higher than those in undeveloped areas [56]. The land use spatial distribution varies, influencing water quality. In order to determine at what distance the heterogeneity of the landscape best explains water quality parameters, the fixed effects (developed, forest, pasture, and agriculture land use) were calculated for reach distances varying from 2, 10, 25, 35, 40, 45, and 50-km; riparian buffers of 200, 500, and 1000-m and basin, and an LMM with site, year, and site and year as random effects were fitted to each of them.
The adjusted R-squared (R 2 ) shows that the explanatory power of heterogeneity of landscape has the best model of NNN at a reach distance of 45 km (Figure 5a). In contrast, the results across riparian buffers did not show a relationship between NNN and land use with R 2 < 0.03 ( Figure 6). The NNN was best explained by land use quantified within the entire watershed [57]. Comparing results for the reach distance of 45 km with those for the entire watershed (Figure 5a), the best model for the entire watershed has an R 2 of 0.41 for LMM with year as random effects, while for 45 km, it has a R 2 of 0.64, showing that the total NNN variance is best explained by the land uses at 45 km with year and site as random effects (Figure 5a). Figure 7 shows the parameter estimates of the best fitting model where fixed effects were calculated at 45 km and the corresponding p-values.  Figure 4, the model estimates are valid for developed, forest, pasture and agriculture percentages varying from 3 to 50%, 4 to 44%, 0.1 to 18% and 14 to 87%, respectively. Figure  8b shows changes in the mean TP based on the model estimates, for the plausible percentages of forest, developed, and agriculture in the best model (Figure 5b). More specifically, we looked at TP changes when developed increases and agriculture decreases within two levels of forest and pasture: pasture high 0.18 (orange line in Figure 8b) and The R 2 shows that the explanatory strength of landscape heterogeneity has the best model for DO at a reach distance of 45 km (Figure 5e); however; riparian has similar R 2 with values of 0.62 with year and site as random effects for 200, 500 and 1000 m ( Figure  6). The result is in contrast with the findings of [16], who found that land uses for closer reach distance better describe the variation in water quality due to its higher heterogeneity. DO values are affected in streams by temperature, organic discharges, and fertilizer runoff from farms and lawns that changes with hydrology fluctuations, which are explained by the random model with year and site as random effects. Figure S1e in Supplemental Materials shows the parameter estimates of the best fitting model where fixed effects were calculated at 45 km and the corresponding p-values. Based on Figure 4, the model estimates are valid for developed, forest, pasture and agriculture percentages varying from 4 to 17%, 2 to 28%, 1 to 9% and 57 to 88%, respectively. The changes in the mean DO base shows on the model estimates ( Figure S1e in Supplemental Materials), for the plausible percentages of forest, developed, pasture, and agriculture in the best model ( Figure 5e). In particular, we looked at DO changes when developed and agriculture increase and forest decreases within two levels of forest and pasture: pasture high 0.09 (orange line in Figure 8e) and pasture low 0.01 (blue line in Figure 8e). Figure 8e shows that as developed and agriculture increases and forest decreases, the mean DO decrease having the lowest DO of 8.02 mg L −1 in high pasture condition (orange line). According to Figure 8e, the highest DO of 10.30 mg L −1 is when developed and agriculture decrease, and forest increases in the blue line.
The adjusted R-squared suggests that the explanatory power of heterogeneity of the landscape has the best model for TKN at a distance of 50 km (Figure 5f); similarly, the highest R 2 of 0.68 ( Figure 6) was found for riparian buffers for TKN at 1000 m. The parameter estimates of the best fitting model where fixed effects were calculated at 50 km and the corresponding p-values ( Figure S1f in Supplemental Materials). Based on Figure  4, the model estimates are valid for developed, forest, pasture, and agriculture percentages varying from 4 to 16%, 2 to 29%, 1 to 8%, and 55 to 89%, respectively. Figure  S1f in Supplemental Materials shows the changes in the mean TKN based on the model estimates, for the plausible percentages of forest, developed, and agriculture in the best model (Figure 5f). Specifically, we analyzed shifts in TKN when developed, and agriculture increases within two levels of forest and pasture: pasture high 0.08 (orange line in Figure 8f) and pasture low 0.01 (blue line in Figure 8f). Figure 8f shows that as developed and agriculture increase and forest decreases, mean TKN increases, and TKN has higher value in low pasture condition (blue line) compared to low pasture (orange line). According to Figure 8f, the lowest TKN 0.43 mg L −1 is when developed and

Relationship between Land Use and Water Quality
The study was designed to understand the influence of land use on water quality throughout different reach distances and riparian buffers over a long period (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016). Because land uses are not randomly distributed throughout the basins, changing reach distances and riparian buffer widths of the contributing area alters the relative proportion of different land uses, the perceived dominant land use, and the observed rate of land-use change with downstream distance. Comparing the whole catchment with reach distances and riparian buffers, the whole catchment land use distribution appears more similar to each other, than they do to smaller reach distances or riparian buffer land use distributions. The study was not intended to predict particular water quality values at individual points over the years, as the design approach was tailored to spatial relationships. Therefore, this analysis is useful in determining what reach distance and riparian buffer, the heterogeneity of the landscape, best explains water quality parameters. This study found that the LMM best explained the variance of mean NNN and DO at a reach distance of 45 km; for mean TP and TSS, the best reach distance is 10 km; and for mean COD and TKN, the best reach distance is 50 km. Changing the reach distances or riparian buffer widths would also affect the relationship between land use and water quality. For example, Figure 7 shows that NNN has a high R 2 at 40 km and 45 km reach distances, indicating that at those reach distances, the influence of landscape configuration is a better predictor of water quality than other reach distances.
As the developed proportion increases, the agricultural proportion decreases, and the NNN, TP, and TSS means reduce as shown in Figure 8a-c, respectively. This result is beneficial because it shows that TP has a strong relationship with agricultural and developed land use [2]. As developed and agricultural proportion increases, the COD and DO     Figure 4, the model estimates at 45 km are valid for developed, forest, pasture and agriculture percentages varying from 4.4 to 17%, 1 to 28%, 1 to 9% and 57 to 88%, respectively. Figure 8a visualizes the changes in the mean NNN based on the model estimates, for the plausible percentages of forest, developed, and agriculture of the best model (Figure 5a). More specifically, we looked at NNN changes when developed increases and agriculture decreases within one level with forest and pasture constant: forest high area of 27.9% (blue line in Figure 8a) and forest low area of 1.8% (orange line in Figure 8a). Figure 8a shows that as developed increases and agriculture decreases, the mean NNN decreases, and NNN has higher values in low forest conditions (orange line) compared to the high forest (blue line). Nitrogen losses from Indiana watersheds typically range from 4 to 20 Kg N/ha/yr, depending on precipitation; in the streams, inorganic N loads are 90% or more NO 3 -N [58]. In intensively farmed areas, fertilizer use and drainage modifications appear to interact to produce large NO 3 − N losses through tiles in the cornbelt region [59]. In channelized headwater streams, precipitation and solutes are drained rapidly, bypassing riparian buffer areas with Indiana's vast network of subsurface tile drains [60]. Nitrogen in the form of NO 3 − is extremely mobile in soil where nitrate degradation by leaching is a physical process in which soluble NO 3 − moves below the root zone with soil water. The best model of NNN at a reach distance of 45 km is consistent with the high mobility of NO 3 − .
Water 2021, 13, x FOR PEER REVIEW 16 of 20 mean decrease as shown in Figure 8d,e, respectively, and the TKN mean increases as shown in Figure 8f. The amount of nitrogen generated from agricultural lands was found about seven times higher than from impermeable urban areas and nine times higher than produced in pervious urban areas [2]. In a study with land use change, researchers found that when developed area increases, COD concentration decreases [61]. Forested land has been widely identified as having an essential role in improving the quality of surface water [62,63]. Similar findings were found in this study, where increasing forest land proportion decreases mean NNN, TP, TSS, and TKN, which may occur due to its role in the conservation of water, prevention of soil erosion and the filtration of overland pollutants [1]. The R 2 indicates that the explanatory strength of landscape heterogeneity shows the best model of TP at a reach distance of 10 km (Figure 5b). This result is similar for riparian buffers with an R 2 of 0.79 for LMM with year and site as random effects for 200, 500 and 1000 m ( Figure 6). The results are similar to those of Liu et al. (2017), who found that the TP concentration was most correlated to land use in closer reach distances. Phosphorus reaches water in agricultural settings. Phosphorus is not highly soluble, and it moves slowly. It binds strongly to fine soil particles such as clay-sized material (<2 microns), and they are transported by flow through artificial tile drainage. Midwest agricultural land has tile drainage, where phosphorus moves in the form of ions of phosphate PO 4 −3 (Royer et al., 2006), which is consistent with the result that the best TP model is at a reach distance of 10 km.
The parameter estimates ( Figure S1b in Supplemental Materials) show the best fitting model where fixed effects were calculated at 10 km and the corresponding p-values. Based on Figure 4, the model estimates are valid for developed, forest, pasture and agriculture percentages varying from 3 to 50%, 4 to 44%, 0.1 to 18% and 14 to 87%, respectively. Figure 8b shows changes in the mean TP based on the model estimates, for the plausible percentages of forest, developed, and agriculture in the best model (Figure 5b). More specifically, we looked at TP changes when developed increases and agriculture decreases within two levels of forest and pasture: pasture high 0.18 (orange line in Figure 8b) and pasture low 0.003 (blue line in Figure 8b). Figure 8b shows that as developed increases and agriculture decreases, mean TP decreases, and TP has lower values in high pasture conditions (orange line) compared to low pasture (blue line). According to Figure 8b, as forest increase from 0.10 to 0.50 (blue line), TP decreases from 0.76 to 0.14 mg L −1 .
The adjusted R-squared for TSS shows that the explanatory strength of landscape heterogeneity has the best model at a 10 km reach distance (Figure 5c). Riparian scales show lower adjusted R 2 values ( Figure 6). The parameter estimates show the best fitting model with year-site as a random effect where fixed effects were calculated at 10 km and the corresponding p-values ( Figure S1c in Supplemental Materials). Based on Figure 4, the model estimates are valid for developed, forest, pasture and agriculture percentages varying from 3 to 50%, 4 to 45%, 1 to 18% and 14 to 87%, respectively. Figure S1c in Supplemental Materials shows the changes in the mean TSS based on the model estimates, for the plausible percentages of forest, developed, and agriculture in the best model. In particular, we looked at TSS changes when developed, and forest increases and agriculture decreases within two levels of pasture and forest: pasture 0.01 (orange line in Figure 8c), mean TSS decreases from 90.82 to 33.97 mg L −1 ; and pasture 0.18 (blue line in Figure 8c), shows a decreasing mean TSS from 44.44 to 20.57 mg L −1 , additionally when forest decreases, mean TSS decreases too.
The R 2 suggests that the explanatory power of heterogeneity of the landscape has the best model for COD at a reach distance of 50 km (Figure 5d). COD for riparian buffers shows similar results with the highest R 2 of 0.62 ( Figure 6). Figure S1d in Supplemental Materials shows the parameter estimates of the best fitting model where fixed effects were calculated at a 50 km reach distance and the corresponding p-values. Based on Figure 4, the model estimates are valid for developed, forest, pasture and agriculture percentages varying from 4 to 16%, 2 to 29%, 1 to 8% and 55 to 89%, respectively. Figure S1d in Supplemental Materials shows the changes in the mean COD based on the model estimates, for the plausible percentages of forest, developed, and agriculture in the best model ( Figure 5d). Specifically, we analyzed shifts in COD when developed, and agriculture increases within two levels of forest and pasture: pasture high 0.08 (orange line in Figure 8d) and pasture low 0.01 (blue line in Figure 8d). The COD variation was better explained by closer reach distances with site as a random effect which is likely due to the proximity to urban areas of water quality stations, where daily inputs, such as urban stormwater runoff, including plant debris, sewage, waste, gasoline and motor oil, heavy metals, fertilizer, and pesticides are common reasons for higher COD. Figure 8d shows that as developed and agriculture increases and forest decreases, mean COD decreases, and COD has lower values in high pasture proportion condition (orange line) compared to low pasture (blue line). According to Figure 8d, the highest COD of 24.76 mg L −1 is when developed and agriculture decreases, and forest increases in the blue line; in contrast, the lowest COD 7.70 mg L −1 is when developed and agriculture increases, and forest decreases in the orange line.
The R 2 shows that the explanatory strength of landscape heterogeneity has the best model for DO at a reach distance of 45 km (Figure 5e); however; riparian has similar R 2 with values of 0.62 with year and site as random effects for 200, 500 and 1000 m ( Figure 6). The result is in contrast with the findings of [16], who found that land uses for closer reach distance better describe the variation in water quality due to its higher heterogeneity. DO values are affected in streams by temperature, organic discharges, and fertilizer runoff from farms and lawns that changes with hydrology fluctuations, which are explained by the random model with year and site as random effects. Figure S1e in Supplemental Materials shows the parameter estimates of the best fitting model where fixed effects were calculated at 45 km and the corresponding p-values. Based on Figure 4, the model estimates are valid for developed, forest, pasture and agriculture percentages varying from 4 to 17%, 2 to 28%, 1 to 9% and 57 to 88%, respectively. The changes in the mean DO base shows on the model estimates ( Figure S1e in Supplemental Materials), for the plausible percentages of forest, developed, pasture, and agriculture in the best model ( Figure 5e). In particular, we looked at DO changes when developed and agriculture increase and forest decreases within two levels of forest and pasture: pasture high 0.09 (orange line in Figure 8e) and pasture low 0.01 (blue line in Figure 8e). Figure 8e shows that as developed and agriculture increases and forest decreases, the mean DO decrease having the lowest DO of 8.02 mg L −1 in high pasture condition (orange line). According to Figure 8e, the highest DO of 10.30 mg L −1 is when developed and agriculture decrease, and forest increases in the blue line.
The adjusted R-squared suggests that the explanatory power of heterogeneity of the landscape has the best model for TKN at a distance of 50 km (Figure 5f); similarly, the highest R 2 of 0.68 ( Figure 6) was found for riparian buffers for TKN at 1000 m. The parameter estimates of the best fitting model where fixed effects were calculated at 50 km and the corresponding p-values ( Figure S1f in Supplemental Materials). Based on Figure 4, the model estimates are valid for developed, forest, pasture, and agriculture percentages varying from 4 to 16%, 2 to 29%, 1 to 8%, and 55 to 89%, respectively. Figure S1f in Supplemental Materials shows the changes in the mean TKN based on the model estimates, for the plausible percentages of forest, developed, and agriculture in the best model ( Figure 5f). Specifically, we analyzed shifts in TKN when developed, and agriculture increases within two levels of forest and pasture: pasture high 0.08 (orange line in Figure 8f) and pasture low 0.01 (blue line in Figure 8f). Figure 8f shows that as developed and agriculture increase and forest decreases, mean TKN increases, and TKN has higher value in low pasture condition (blue line) compared to low pasture (orange line). According to Figure 8f, the lowest TKN 0.43 mg L −1 is when developed and agriculture decrease and forest increases in the orange line; in contrast, the highest TKN 0.79 mg L −1 is when developed and agriculture increase and forest decreases in the blue line.

Relationship between Land Use and Water Quality
The study was designed to understand the influence of land use on water quality throughout different reach distances and riparian buffers over a long period (2001-2016). Because land uses are not randomly distributed throughout the basins, changing reach distances and riparian buffer widths of the contributing area alters the relative proportion of different land uses, the perceived dominant land use, and the observed rate of land-use change with downstream distance. Comparing the whole catchment with reach distances and riparian buffers, the whole catchment land use distribution appears more similar to each other, than they do to smaller reach distances or riparian buffer land use distributions. The study was not intended to predict particular water quality values at individual points over the years, as the design approach was tailored to spatial relationships. Therefore, this analysis is useful in determining what reach distance and riparian buffer, the heterogeneity of the landscape, best explains water quality parameters. This study found that the LMM best explained the variance of mean NNN and DO at a reach distance of 45 km; for mean TP and TSS, the best reach distance is 10 km; and for mean COD and TKN, the best reach distance is 50 km.
Changing the reach distances or riparian buffer widths would also affect the relationship between land use and water quality. For example, Figure 7 shows that NNN has a high R 2 at 40 km and 45 km reach distances, indicating that at those reach distances, the influence of landscape configuration is a better predictor of water quality than other reach distances.
As the developed proportion increases, the agricultural proportion decreases, and the NNN, TP, and TSS means reduce as shown in Figure 8a-c, respectively. This result is beneficial because it shows that TP has a strong relationship with agricultural and developed land use [2]. As developed and agricultural proportion increases, the COD and DO mean decrease as shown in Figure 8d,e, respectively, and the TKN mean increases as shown in Figure 8f. The amount of nitrogen generated from agricultural lands was found about seven times higher than from impermeable urban areas and nine times higher than produced in pervious urban areas [2]. In a study with land use change, researchers found that when developed area increases, COD concentration decreases [61].
Forested land has been widely identified as having an essential role in improving the quality of surface water [62,63]. Similar findings were found in this study, where increasing forest land proportion decreases mean NNN, TP, TSS, and TKN, which may occur due to its role in the conservation of water, prevention of soil erosion and the filtration of overland pollutants [1].
Pasture land played a dominant role in influencing TN and TP in the Jinsha River [18]. This research had similar findings, where increasing pasture proportion increases the mean TP, TSS, and TKN, as shown in Figure 8b,c,f, respectively. Pasture land has a better relationship with water quality for further reach distances, likely due to the averaging out of the effects of local heterogeneity, that is a widely observed phenomenon in ecological trends [42,64]. As pasture proportion decreases, the mean COD and DO increase, as shown in Figure 8d,e, respectively. Pasture land was found inversely proportional to COD [61]. Pasture can benefit water quality by reducing the influx of nutrients entering into streams.
Increasing agricultural land increases the TP mean, which agrees with previous findings [2,65,66], suggesting that surface runoff and cropland degradation is the principal cause of phosphorus in streams [57]. Nitrogen and phosphorus losses in surface runoff are also more significant in agricultural areas than land with natural vegetation, where phosphorus losses are transported by sediment eroded from cropland [67]. This study showed (Figure 8a) that NNN increases when agricultural land proportion increases. The result is consistent with nitrogen fertilizer use in the corn belt area, which increases nitrogen inputs to watersheds [68].

Conclusions
This study showed that for developed and pasture land use percentages, there is not much difference among study watershed reach distances. The 2-km reach distance has the most outliers in developed and pasture land. In contrast, forest shows differences among reach distances. Agricultural land has the most significant differences among reach distances, varying from 0.1 to 89%. The developed and pasture land uses show that there is likely not a difference among reach distances, while in contrast, forest and agriculture are likely to have differences among reach distances, and the data are more dispersed at 2 km than at longer reach distances. This study revealed that closer reach distances have more dispersed data, due to the area size and heterogeneity, in contrast to farther distances, which experienced less dispersed data and more uniformity in land use.
The study demonstrated the efficacy of the linear mixed model to examine the importance of spatial extent on models with river water quality parameters. The adjusted R 2 improved from 0.36 to the highest adjusted R 2 of 0.79 for NNN at a reach distance of 45 km. The best model for TP is at a reach distance of 10 km, where the model could explain 82% of the variance, and the highest adjusted R 2 of 0.26 for TSS is at the same reach distance. The reason that TP and TSS have a higher adjusted R 2 at the same distance is likely since TP adheres to sediment particles and is likely to be absorbed on sediments. The lowest mean TSS of 20.57 mg L −1 was found at lower agricultural land use percentages; in contrast, the highest mean TSS of 90.82 mg L −1 was found at high agricultural land use percentage. The results of the study suggest that NNN, DO, COD and TKN are better predicted for land uses at further reach distances, as opposed to TP and TSS that are better predicted for land uses at closer reach distances.
This study found that LMM with year and site as random variables better explained the NNN, TP, TSS, COD, DO, and TKN water quality parameters variance than linear models. Year and site help explain part of the variation among years and sites for the water quality parameter means. For TP, the linear model explained 20% of the variance at a 10-km reach distance, compared to the LMM with year × site that explained 82% of the variance at a 10-km reach distance. Comparing these results with riparian buffer results, the linear model explained 0.05% of the variance at 10 km, compared to the LMM with year × site that explained 80% of the variance. However, riparian buffer plays a significant role in governing water quality parameter transport and retention, while the linear model and LMM explained less of the variance. In summary, this study found that the land use predicted NNN, TP, TSS, COD, DO, and TKN water quality parameters better using reach distances than buffers.
In general, the use of linear mixed model in the monitoring of water quality has outlined many advantages of its use and how the effects of the fitted models can be used to classify catchments and provide information on the factors that can help optimize local land use and manage water quality. Further research will be required to refine and define more spatial scales, and use of high spatial resolution digital maps to aid improve modeling and temporal sampling will likely better explain the dynamic existence of the relationship between land use and water quality.