Assessing Land-Cover Effects on Stream Water Quality in Metropolitan Areas Using the Water Quality Index

This study evaluated the influence of different land-cover types on the overall water quality of streams in urban areas. To ensure national applicability of the results, this study encompassed ten major metropolitan areas in South Korea. Using cluster analysis, watersheds were classified into three land-cover types: Urban-dominated (URB), agriculture-dominated (AGR), and forest-dominated (FOR). For each land-cover type, factor analysis (FA) was used to ensure simple and feasible parameter selection for developing the minimum water quality index (WQImin). The chemical oxygen demand, fecal coliform (total coliform for FOR), and total nitrogen (nitrate-nitrogen for URB) were selected as key parameters for all land-cover types. Our results suggest that WQImin can minimize bias in water quality assessment by reducing redundancy among correlated parameters, resulting in better differentiation of pollution levels. Furthermore, the dominant land-cover type of watersheds, not only affects the level and causes of pollution, but also influences temporal patterns, including the long-term trends and seasonality, of stream water quality in urban areas in South Korea.


Introduction
Global urbanization is an ongoing trend, with 55% to 68% of the world's population projected to reside in urban areas by 2050 [1].
Urbanization induces multiple stressors, especially land-use/land-cover changes such as deforestation and the growth of industrial and residential areas, resulting in increased impervious surfaces [2][3][4][5]. Consequently, urbanization leads to a deterioration of water quality in streams through an increase in pollution sources and various hydromorphological changes [6][7][8]. Despite their at-risk status, streams in urban areas are crucial water resources with a number of designated uses, such as drinking water supply, recreation, and wildlife conservation [9][10][11][12].
Therefore, it is vital to establish management strategies for preventing or alleviating water quality problems; this requires efforts to monitor and assess stream water quality in urban areas. The water quality index (WQI), an approach that quantitatively integrates a number of chemical, physical, and biological water quality parameters, has been widely used to assess the water quality status of both surface and groundwater systems [13][14][15][16][17]. In recent years the advent of big data and the accumulation of monitored multivariate data has prompted a substantial increase in the application of WQI to environmental and ecological studies [18][19][20]. In many of these studies, the developed WQI has been used to capture long-term trends [21,22], seasonal fluctuations [23,24], or spatial variations [25,26] in the overall stream water quality in urban areas. As well as determining the spatiotemporal patterns of stream water quality in urban areas, previous WQI-based research has also determined pollution sources and anthropogenic effects [27][28][29] and selected the key parameters that represent variations in water quality [30][31][32][33].
Recent assessments of urban stream water quality have increasingly employed parameter selection using a number of statistical methods, highlighting the advantages of this process for cost and time saving during assessment. For example, Wu et al. [33] used stepwise multiple regression, which assumes linearity between the WQI and each parameter, to select five parameters representing the water quality of streams in the highly developed area of Lake Taihu Basin, China. Tripathi and Singal [31] used both principal component analysis (PCA) and correlation analysis to select nine parameters to develop a WQI for the Ganga River, which flows through some highly polluted cities of India. Moreover, linear discriminant analysis was applied by Han et al. [34] to select parameters that most effectively differentiate temporal groups (wet versus dry period) and spatial groups (east vs. west parts of the lake) in the Fu River and Baiyangdian Lake, both of which are located in a highly populated region of northern China.
However, the spatial scales of previous parameter selection studies have been limited to single water bodies or single basins; thus, the parameters selected in these studies have limited applicability to other urban stream ecosystems. Furthermore, the effects of different types of anthropogenic activities (e.g., industry, cultivation, or forestation), on stream water quality in urban areas has rarely been considered [26,35]. To overcome these limitations, this study presents the first attempt, to our knowledge, to explicitly account for the effects of different land-cover types on the water quality response and key water quality parameters of urban streams. This study was conducted on a national scale, encompassing a wide range of hydromorphological and geographical characteristics and socioeconomic backgrounds, which are also key factors influencing water quality [36][37][38][39][40]. Therefore, this study aimed to provide parameter selection results that are both informative and applicable to other unexplored streams in urban areas of South Korea.
Streams across ten major metropolitan areas of South Korea were investigated. Cluster analysis was performed to classify stream watersheds based on their land-cover characteristics. Then, the objective WQI (WQI obj ) was calculated for each land-cover type using all available water quality parameters. The long-term trends of WQI obj were evaluated using the seasonal Mann-Kendall (SMK) test, and only periods exhibiting temporal stability were used in further analyses. For each land-cover type, key parameters were selected using factor analysis (FA) to develop the minimum WQI (WQI min ). The objectives of this study were: (1) To assess the long-term trends and seasonality of the overall stream water quality in metropolitan areas in South Korea; (2) to analyze how different land-cover types affect stream water quality in urban areas and key water quality parameters; and (3) to evaluate the correlation between WQI obj and WQI min and relationships between WQI min and land-covers.

Study Area and Data Description
Ten major metropolitan areas across South Korea, with populations of greater than one million, were included in this study (Seoul, Busan, Incheon, Daegu, Daejeon, Gwangju, Suwon, Ulsan, Changwon, and Goyang ( Figure 1)) [41]. Within the study area, 81 water quality monitoring sites were selected at tributaries that directly or indirectly flow into either the Han, Geum, Nakdong, or Yeongsan Rivers, the four major rivers of South Korea. The selected monitoring sites covered 35 standard watersheds with the range of watershed area from 39 to 294.9 km 2 , and a mean area of 103.29 km 2 , the smallest unit of the drainage area division system in South Korea (http://wamis.go.kr). Water quality data were provided by the National Institute of Environmental Research of the Ministry of Environment (http://water.nier.go.kr). The data spanned the time period from 2007 to 2018, and the monitoring frequency varied by site from weekly to monthly. Among the 54 water quality parameters initially included in the data, heavy metals and other toxic chemicals, such as mercury, cadmium, arsenic, and cyanide, were not included because at least 99.5% of the values for these parameters were either missing or below the detection limit. Furthermore, parameters without available reference values were not included in the analyses. The reference values (i.e., normalization factors and weights) required to develop the Bascarón WQI were provided by previous studies [27,[42][43][44]. Fourteen water quality parameters were included in the analyses: Water surface temperature (Temp), electrical conductivity (EC), pH, dissolved oxygen (DO), five-day biochemical oxygen demand (BOD 5 ), chemical oxygen demand (COD), suspended solids (SS), total nitrogen (TN), ammonium nitrogen (NH 4 + -N), nitrate nitrogen (NO 3 − -N), total phosphorus (TP), orthophosphate phosphorus (PO 4 3− -P), total coliform (TC), and fecal coliform (FC) ( Table 1). Among the 81 monitoring sites initially selected for our study, 58 were included for the water quality assessment as they had measurements for all 14 water quality parameters. Land-cover data were provided by the Environmental Geographic Information System; the year of data collection varied from 2010 to 2018 depending on the region (https://egis.me.go.kr). The land-cover data involved seven categories: urban (or built-up) land, agricultural land, forested land, grassland, wetland, barren land, and water. For each of the 35 watersheds, the relative proportions of the seven land-cover categories were calculated using QGIS 2.18.16 [45] and ArcGIS 10.3 software [46].  CA is an unsupervised pattern recognition technique, whereby individual objects are grouped into a number of clusters whose objects are more similar than those in other clusters. Among the available CA methods, hierarchical agglomerative CA (HACA) was used in this study. HACA is a successive process, in which two objects in the closest proximity form a cluster at the lowest hierarchy. In the next step, the newly generated two clusters in the closest proximity form a combined cluster. Merging continues until all objects are linked to form a single cluster at the highest hierarchy. The squared Euclidean distance was used as a measure for calculating the proximity between objects/clusters. Furthermore, we employed the Ward's minimum variance linkage function, which uses distance information to merge objects into a hierarchical cluster tree and is visually represented by a dendrogram [47]. As HACA results in a single cluster, the dendrogram needs to be divided at a specific height to generate multiple clusters. The height in the dendrogram can be defined as (D link /D max )·100, where D link is the linkage distance for a pair of objects/clusters and D max is the maximum linkage distance. According to previous studies, the height for dendrogram partitioning was set to 60; that is, (D link /D max )·100 > 60 [48,49]. The CA was performed using 'dendrogram' function from the 'SciPy' library [50] in Python 3.6 [51]. To generate clusters based on land-cover type, HACA was performed using the relative proportions of the six land-cover types for each standard watershed (excluding water) as variables.
The differences in water quality parameters and WQI among different clusters were assessed using summary statistics and non-parametric tests, i.e., Kruskal Wallis H and Mann-Whitney U tests. Non-parametric tests were selected due to the non-normality of water quality parameters. The Kruskal Wallis H test examined the differences in distributions for the three clusters. When the significant differences occurred, as a post-hoc analysis, the Mann-Whitney U test was used to identify which cluster(s) revealed the significant difference in distribution from the other cluster(s). The Kruskal Wallis H and Mann-Whitney U tests were performed using 'kruskal' and 'mannwhitneyu' from the 'SciPy' library [50] Python 3.6 [51]. Statistical significance was indicated by p-value < 0.05.

Water Quality Index (WQI) Development
The method for WQI development used in this study builds on the WQI obj [43], a modification of the Bascarón WQI, also known as subjective WQI [13], which excludes the constant term multiplied to WQI obj , which reflects the subjective judgment of overall water quality. The WQI obj is calculated as follows, where n is the number of available water quality parameters, C i is a normalization factor that converts the value of a parameter into a common scale ranging from 0 to 100 with an interval of 10 (Table 1), P i is the weight indicating the relative importance of parameters, which ranges from 1 to 4 (Table 1), and WQI min is a simplification of WQI obj indicating the minimum WQI [42,43] and is calculated as, Note that Equation (2) for WQI min does not include the weight term, indicating that the parameters included in WQI min assessment are considered equally important. Here, n min is the number of key parameters, which is a subset of all n available parameters. The WQI obj and WQI min scores were graded into five classes to indicate the overall water quality status: excellent (91-100), good (71-90), medium (51)(52)(53)(54)(55)(56)(57)(58)(59)(60)(61)(62)(63)(64)(65)(66)(67)(68)(69)(70), bad , and very bad (0-25) [42,43,52]. Also, when comparing WQI min with WQI obj for evaluating whether they are well-correlated, linear regression (WQI obj = a·WQI min + b) was performed using 'linregress' function from the 'SciPy' library [50] in Python 3.6 [51].

Seasonal Mann-Kendall (SMK) Test
The Mann-Kendall (MK) test is a non-parametric test that assesses if the temporal trend of a variable exhibits a monotonic increase or decrease [53,54]. The SMK test is an extension of the MK test that accounts for the effect of seasonality by performing the test separately for each pre-defined season [55]. In this study, the SMK was employed to identify the point in time at which WQI no longer shows a significant increasing or decreasing trend; this stabilized time period was divided into training and test sets, and further analyses were performed. CA results were used to assess the trend of monthly averaged WQI obj for each land-cover cluster, initially using the data for the entire time period (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). When the trend showed a significant increase or decrease (p-values < 0.05), the SMK was performed excluding a 1-year period of data from the starting year. The test was iteratively performed until the trend appeared to be insignificant. The SMK test was performed using 'seasonal_test' function from the 'pyMannKendall' library [56] in Python 3.6 [51].

Factor Analysis (FA)
FA attempts to account for the structure (i.e., correlation and variation) of data, consisting of measured variables with a reduced number of factors, which are also termed latent variables. Exploratory FA (EFA), which does not assume a priori relationships among the factors and measured variables, was used to reveal the underlying factors behind the correlations among measured water quality parameters. Contrary to confirmatory FA, EFA does not posit any relationship between specific factors and measured variables. Therefore, it suits the purpose of this analysis.
Prior to analysis, the values of all water quality parameters except Temp and pH were log-transformed, and the values of all parameters were standardized to have a distribution with a mean of zero and standard deviation of one. To examine whether water quality data were suitable for FA, the Kaiser-Mayer-Olkin (KMO) test [57] and Bartlett's test [58] were performed. The FA was assumed to be valid when the KMO value exceeded 0.5 and the Bartlett's test result was significant (p-value < 0.05).
To determine the number of factors retained in the FA, Horn's parallel analysis (PA) was used [59]. PA compares the eigenvalues (which indicate the relative importance of a factor in explaining the variance of measured variables) from measured data with the eigenvalues from random data, which have the same sample size and number of variables as the measured data and are obtained using a Monte-Carlo simulation. The differences between the eigenvalues from measured data and the mean eigenvalues from random data were calculated. Factors with differences greater than zero were retained in the FA.
As a method for factor extraction, principal component analysis (PCA) was used [60,61]. The maximum likelihood method, another common method for factor extraction, was not selected because of its multivariate normality requirement, which is often not met for water quality parameters even after the transformation (e.g., log-transformation) of values. Squared factor loading, which reflects the proportion of variance in a measured variable explained by each factor, was calculated as a result of PCA implementation. The communality was calculated by summing the squared factor loadings of a given variable across all factors to indicate the proportion of variance in a measured variable explained by all factors. Moreover, the uniqueness was calculated by subtracting the communality from the total variance of a variable.
Factor rotation (i.e., the change in the axes of factors) was implemented to yield interpretable factors by attaining a simple structure for factor loadings. Without rotation, most variables load heavily onto the first and early factors, whereas rotation yields a simple structure in which each variable loads heavily onto only one factor, while loading lightly onto the other factors. Varimax rotation was used as a rotation method, which is a common type of orthogonal rotation. Orthogonal rotation assumes that factors remain uncorrelated with one another. FA was performed using 'principal' function from 'psych' [62] packages in R 3.5.3 [63].
For each land-cover type, the water quality parameter that showed the highest loading factor associated with each retained factor was interpreted as a key water quality parameter. Accordingly, the number of retained factors corresponded to the number of key parameters representing the overall stream water quality in urban areas. The FA procedure was performed for each land-cover type determined by CA, and the key parameters for different land-cover types were used for the WQI min calculation.

Land-Cover Characteristics of Metropolitan Areas in South Korea
Using the HACA, three clusters were generated based on the land-cover characteristics of 35 watersheds in ten major metropolitan areas (Figure 2a). Notably, each of the watersheds included in each of the three clusters had a single dominant land-cover: Urban, agriculture, and forest, respectively ( Figure 2b, Table S1). The mean proportion of urban land for the 15 watersheds with urban-dominated land-cover (URB) was 0.50 (± one standard deviation of 0.12), which was higher than that of agricultural (0.06 ± 0.05) and forested (0.30 ± 0.10) land. In contrast, the five watersheds with agriculture-dominated land-cover (AGR) had a mean relative area of 0.44 (± 0.08) for agricultural land-cover, which was more dominant than urban (0.16 ± 0.07) and forested (0.24 ± 0.07) land-cover. The 15 watersheds with forest-dominated land-cover (FOR) were mainly composed of forested land, with a mean proportion of 0.60 (± 0.08), whereas the proportion of urban (0.12 ± 0.06) and agricultural (0.16 ± 0.04) land was relatively minor. The three land-cover types (URB, AGR, and FOR) were unevenly distributed across the metropolitan areas. Among the URB, 73.3% were concentrated in Seoul (nine watersheds) and its adjacent cities, Suwon (one watershed) and Incheon (one watershed). Three of the five AGR were located in Gwangju, whereas the other two were located in Busan and Changwon. The spatial distribution of FOR was also concentrated, with 33.3% in Daejeon and 26.7% in Daegu.

Land-Cover Effects on Stream Water Quality in Urban Areas
The long-term trends of overall water quality calculated using all available parameters (WQI obj ), based on the results of SMK tests, differed by land-cover type (Figure 3). For URB, WQI obj values gradually improved until becoming stable in 2015 (Figure 3a). In comparison, WQI obj values for AGR showed a greater improvement in early years before becoming stable in 2012 (Figure 3b). For FOR, WQI obj values did not show any significant trend during the entire period from 2007 to 2018 (Figure 3c). In more recent years (2015-2018), during which all land-cover types exhibited a stable trend, the overall water quality was worst for URB (p-values < 0.05 as a result of Kruskal Wallis H and Mann-Whitney U tests), as indicated by lower WQI obj values (75.04 ± 9.90) than those for AGR (78.91 ± 8.31) and FOR (82.82 ± 7.97). Regardless of the land-cover type and time period, WQI obj values tended to be lower during the wet season (July to September) than during the dry season ( Figure 3). The land-cover types of the watersheds influenced most water quality parameters in urban streams except for pH, EC, DO, and PO 4 3− -P, which were similar regardless of the dominant land-cover (Table 2). Compared with URB and AGR, FOR exhibited the lowest level of contamination for the majority of water quality parameters. The level of contamination between URB and AGR differed depending on the water quality parameter. In terms of nitrogen (TN and NO 3 − -N) and microbiological indicators (TC and FC), the streams in URB exhibited significantly worse conditions than those in AGR (Table 2). On the other hand, indicators for organic matter (BOD 5 and COD) and turbidity (SS) indicated significantly higher levels of water contamination in AGR than URB (Table 2).

Key Water Quality Parameters for Different Land-Cover Types
The water quality data were suitable for the application of FA, as indicated by the results of the KMO test (0.82 for URB, 0.67 for AGR, and 0.73 for FOR) and Barlett's test (p-value < 0.05 for all land-cover types). To perform the FA, the data measured during the more recent years (2015-2018), when the WQI obj values stabilized for all land-cover types, were divided into training (2015-2016) and testing (2017-2018) data sets. The results of FA using the training data indicated that three factors apiece should be retained for URB, AGR, and FOR (Table S2). For each land-cover type, the water quality parameters with the highest factor loading, associated with each of the retained factors, were selected as the key parameters for the WQI min calculation (Table 3). Frequently, for a given factor, more than one water quality parameter had a factor loading greater than 0.75 [64], which is indicative of a strong correlation between the factor and the parameter (Table 3). In such cases, the parameters were generally highly correlated to each other, with a Pearson's correlation coefficient ranging from 0.49 to 0.88 ( Figure S1). Consequently, the three key parameters selected for URB were COD, FC, and NO 3 − -N, in order of corresponding factors (Table 3). Three parameters were selected for AGR were FC, COD, and TN (Table 3). The three parameters selected for FOR were COD, TN, and TC (Table 3).

Comparison between WQI obj and WQI min
Using the test data, the relationships between monthly WQI min and WQI obj values were assessed; WQI min and WQI obj generally exhibited moderate to strong, linear relationships with R 2 values of 0.66 for URB, 0.78 for AGR, and 0.73 for FOR (Figure 4). For both WQI obj and WQI min , URB was generally associated with the poorest overall water quality, with mean WQI values of 75.79 and 67.20, respectively. Further, based on both WQI obj and WQI min , the overall water quality for AGR (mean WQI values of 78.86 and 73.39) was generally poorer than that for FOR (mean WQI values of 82.41 and 77.41). The location of intersection, where the regression line and one-to-one line cross, differed by land-cover type: 87.48 for URB, 81.98 for AGR, and 88.14 for FOR (Figure 4). Below the intersection, WQI obj values tended to be higher than WQI min scores, whereas the opposite was true above the intersection ( Figure 4). As the proportion of values below the intersection was greatest for URB, the positive difference between the mean WQI obj and WQI min values for URB (8.59) was greater than that for AGR (5.47) and FOR (5.00). Within each land-cover type, the variation of WQI min values, with one standard deviation of 13.61 for URB, 13.05 for AGR, and 12.09 for FOR, was greater than the variation of WQI obj values, with one standard deviation of 9.27 for URB, 8.62 for AGR, and 7.97 for FOR. Note that the degree of variation in WQI values, in descending order, was URB, AGR, and FOR for both WQI obj and WQI min . Table 3. Factor loadings for 14 water quality parameters for watersheds with urban-dominated (URB), agricultural-dominated (AGR), and forest-dominated (FOR) land-cover. Asterisks (*) indicate a factor loading greater than 0.75 or the highest factor loading in the factor. Var (%) represents the explained variance of total variance for each factor.

Spatial Distribution of Overall Stream Water Quality in Urban Areas
WQI values by site, calculated for 2015-2018, indicated that WQI obj and WQI min values were highly linearly correlated, with an R 2 value of 0.84 (Figure 5b). However, there was a clear tendency for WQI obj values to be higher than WQI min values (Figure 5a,b). The difference in values between WQI obj and WQI min led to differences in WQI classification in 25.9% of the 58 monitoring sites (Figure 5a). In Seoul, the change in calculation method from WQI obj to WQI min yielded a change of classification from good to medium in 33.3% of 18 monitoring sites. In the other five metropolitan areas (i.e., Daejeon, Gwangju, Daegu, Busan, and Ulsan), a change in classification occurred in one or two sites, accounting for 7.7-40.0% of the sites in each area (Figure 5a). In the remaining five metropolitan areas (i.e., Goyang, Suwon, Incheon, and Changwon), no change in WQI classification occurred in response to application of the WQI min (Figure 5a).

Seasonality of Overall Stream Water Quality in Urban Areas
From 2015 to 2018, the monthly patterns of overall water quality calculated using WQI min differed by land-cover type ( Figure 6). For URB, which exhibited the worst overall water quality, the proportion of WQI min values corresponding to equal to or worse than medium status increased during the wet season (July to September), whereas the proportion of good to excellent status sites increased during the dry season (all other months) (Figure 6a). For FOR, the WQI min status was consistently better than or equal to medium, and the proportion of medium status sites increased during the wet season (Figure 6c). For AGR, the WQI min status tended to worsen during the wet season, with an increase in the proportion of medium status sites; however, this seasonality was less consistent compared with other land-cover types (Figure 6b).

Suitability of FA as a Parameter Selection Method
In this study, FA, which involves factor extraction and rotation processes, was used to reduce multiple intercorrelated physical, chemical, and biological water quality parameters into a smaller number of latent factors, and to select key water quality parameters that had the strongest correlation with a given latent factor. In previous studies, along with subjective judgments [27,33,[65][66][67][68][69], multivariate statistical techniques were employed to select parameters on an objective basis. For example, stepwise multiple regression has been used [33,69,70] to determine the set of parameters that could best explain the variance of WQI obj . Compared with unsupervised learning (e.g., FA), regression is a supervised method that requires reference values; in this case, WQI obj values for training data. However, because of the multi-collinearity and the resulting bias, WQI obj is not often a suitable reference.
Furthermore, previous studies have used PCA at the first step followed by Pearson's correlation analysis to extract water quality parameters that showed high contributions to selected components and low correlations with other parameters [31,66]. Post-hoc correlation analysis was required, since few first factors derived from PCA are strongly associated with most of the correlated parameters. Therefore, the application of PCA alone is not sufficient to attain key parameters that represent extracted factors. To address this limitation, in this study PCA was conducted in conjunction with factor rotation, which yields a simple structure for the factor loading matrix, in which only a small number of variables have high loadings onto a given factor and do not overlap among the factors. As a result, parameters with high loadings on a given factor appear to be more distinct and homogeneous. Therefore, a set of parameters with high loadings across all factors are expected to represent multifaceted aspects of water quality. Furthermore, the use of varimax rotation as a factor rotation method ensures the extracted factors are uncorrelated with one another, facilitating the selection of key parameters, the relationships among which can be assumed to be independent. Therefore, factor rotation used in conjunction with PCA does not require subsequent correlation analysis, which simplifies parameter selection to a single-step process.

Key water Quality Parameters
Selected key water quality parameters were similar among different land-cover types (COD, FC, and NO 3 − -N for URB; COD, FC, and TN for AGR; COD, TC, and TN for FOR), indicating that the relationships among parameters were consistent regardless of land-cover type. For example, across all land-cover types, COD, BOD 5 , and SS were closely correlated ( Figure S1) and had high loadings with the same factor ( Table 3). The high correlations were shown, since the three parameters commonly account for biodegradable organic matter. In addition, for one being the subset of the other, FC and TC, and NO 3 − -N and TN, were closely related to each other ( Figure S1) and had the highest loadings onto the same factor for all land-cover types ( Table 3). Note that phosphorus parameters showed moderate to strong associations with TC and FC within the same factor for all land-cover types (Table 3). Therefore, rather than phosphorus parameters, either TC or FC, which showed higher loadings with the factor than the phosphorus parameters, was selected as the key parameter. A possible speculation over this co-occurrence tendency is that phosphorus and fecal indicator bacteria may originate from the same pollution source (e.g., domestic sewage and agricultural runoff) or the same mechanism (e.g., sediment release), but future research will be necessary for interpreting the causal relationships. The presence of multiple parameters with almost equally high loadings onto a given factor necessitated comparisons between WQI min and modified WQI min , in which a key parameter (e.g., COD) is replaced by its surrogate parameter (e.g., BOD 5 ) that was strongly related to the key parameter within the same factor. The results illustrated that modified WQI min was generally in close agreement with WQI min (Figure S2), suggesting that a set of parameters that shows high loadings within the same factor can be used interchangeably. Note that, compared with other sets of parameters, linear relationships between WQI min and modified WQI min for fecal indicator bacteria were weaker because of the large variability inherent in FC and TC concentrations. Nonetheless, given the marginal differences in factor loading between TC and FC regardless of land-cover type (Table 3), between the two parameters, the key parameter should be selected depending on management focus or data availability.
The results of FA need to be interpreted and applied with care. The factor extraction process of FA determines the factors worth retaining, and the subsequent factor rotation, whereby the factors become least correlated with each other, yields the proportion of variance explained by a given factor to be distributed more evenly among the factors. Therefore, it is not particularly valid to prioritize the factors and the consequent key parameters. Instead, the selected key parameters should be considered independently of each other and as equally important. In this regard, assigning different weights to key water quality parameters with equal importance should not be included as a step for WQI min development. Previous studies reported that using weights improved the linearity between WQI min and WQI obj [33,70]. In contrast to these findings, we found that the use of weights, which were estimated based on two methods, the relative weight [33,70] and the percentage of variance explained by the given factor (Table 3), yielded only slight differences in the WQI min -WQI obj relationships ( Figure S3).
It should be acknowledged that the water quality data, used in this study, did not include several widely measured parameters, such as parameters for minerals, salts, metals and flow rate. If such parameters were added to the data, FA may include additional factors and key parameters. Moreover, the results of parameter selection did not contain the basic water quality parameters of Temp and pH in the key parameter list for any land-cover type. In addition, despite being frequently included as a key parameter [42,43,[68][69][70] in previous studies, DO was not selected for any land-cover type in this study (Table 3). Variations in Temp, pH, and DO may be influenced by anthropogenic activities but are also attributable to natural variability. That is, they exhibit diurnal fluctuations and are strongly influenced by meteorological conditions [33,65]. Our results suggest that Temp, pH, and DO, whose patterns are substantially influenced by natural variations, may not successfully capture the total variance of stream water quality in urban areas, and may not be suitable for being included as key parameters.

Comparison between WQI min and WQI obj
Our results of test data showed that WQI min and WQI obj have close linear relationships across all land-cover types (Figure 4), suggesting that WQI min can be used to predict WQI obj using the established regression model. However, WQI min values tended to be higher than WQI obj above a certain threshold and lower than WQI obj below this threshold. This tendency indicates that the use of WQI min eliminates the "eclipse effect" [71], which arises from the redundancy inherent in WQI obj ; accordingly, WQI obj is subject to overestimating bad water quality status and underestimating good water quality status. The removal of redundancy was also evidenced by the larger variance of WQI min compared with that of WQI obj for all land-cover types ( Figure 4). Therefore, the development and use of WQI min is expected to improve the identification of the overall water quality status and the level of water pollution in streams across urban areas. Our results demonstrate that the method selection for WQI assessment has important resource and management implications. Changing the method from WQI obj to WQI min altered the spatial distribution of the overall water quality status; this status change occurred in a minor to substantial portion of monitoring sites, depending on the metropolitan area ( Figure 5). This change suggests that the use of WQI min instead of WQI obj , which may involve a status change from "good" to "medium" or vice versa, may affect priority setting and resource allocation among individual watersheds or groups of watersheds.

Land-Cover Effects on Stream Water Quality in Urban Areas
Our results indicate that the dominant land-cover affected the overall stream water quality in urban areas, with mean values of both WQI obj and WQI min decreasing in the order: FOR > AGR > URB (Figure 4). The dominant land-cover type also contributed to the deterioration of differing water quality parameters (i.e., nitrogen and microbiological indicators for URB, but organic matter and turbidity for AGR) ( Table 2). The long-term trends of overall water quality differed by land-cover type (Figure 3). Over the last decade, WQI obj trends for URB and AGR exhibited early improvement before becoming stable, whereas the trend for FOR did not change significantly (Figure 3). These patterns support that, across the country, management programs implemented to control point or non-point sources for URB and AGR were effective in improving overall stream water quality [72][73][74][75]. Moreover, the implementation of conservation measures against continuing development pressures in metropolitan areas played a role maintaining the water quality in FOR. Furthermore, the land-cover type exerted an influence on the seasonality of overall water quality ( Figure 6). In recent years (2015-2018), the seasonal patterns of WQI min have differed for URB and FOR, whereas AGR exhibited less obvious seasonality. The less consistent seasonality for AGR may be partly attributable to the small sample size (n = 287, compared with n for URB = 1881 and n for FOR = 1162) corresponding to AGR. During the wet season, both URB and FOR exhibited a negative change in overall water quality with an increase in the proportion of "medium" and "good" status sites relative to "excellent" status sites ( Figure 6). For URB with typically high proportions of impervious surfaces, stormwater runoff may play a significant role in decreasing overall water quality during the wet season [76][77][78]. Moreover, an increase in sediment discharge as well as sediment perturbation with rainfall events may facilitate the release of pollutants into surface water [79][80][81][82], resulting in a decrease in overall water quality during the wet season in both URB and FOR. In contrast, subsequent to the wet season, when dilution effects can occur [83][84][85], URB alone exhibited an increase in the proportion of "bad" status sites relative to "medium" and "good" status sites ( Figure 6). This indicates that, not only non-point sources, but also point sources, such as wastewater treatment plant effluent, are significant forms of pollution for URB.

Conclusions
This study provided a statistical framework for implementing parameter selection in order to develop an objective WQI min in a single-step process. Comparisons between WQI obj and WQI min suggested that WQI min calculated with the key parameters yielded comparable results to WQI obj . Furthermore, WQI min reduced the eclipse effects arising from the use of correlated parameters for water quality assessment to result in a better differentiation between good and bad water quality statuses. These results have implications for management authorities, especially those motivated to launch their own monitoring network system but who have limited available resources. In this context, our results can be used to reduce monitoring demands by prioritizing the monitoring importance of a minimal number of water quality parameters. The results of WQI min confirmed that the dominant land-cover type of watersheds influence multidimensional aspects of urban stream water quality; namely, the overall degree and level of pollution as well as long-term and seasonal patterns. To confirm our results, future studies should expand the number of water quality parameters exhibiting various characteristics.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4441/12/11/3294/s1. Figure S1: Matrices of the Pearson's correlation coefficient for the period 2015-2016 among 14 water quality parameters for (a) urban-dominated (URB), (b) agricultural-dominated (AGR), and (c) forest-dominated (FOR) land-cover. Water quality parameters with high factor loadings (>0.75) on the same factor are outlined in the same color, Figure S2: Relationships between the minimum water quality index (WQI min ) and modified WQI min from 2015 to 2018. To develop the modified WQI min , key parameter values were predicted using the established linear relationship between a key parameter and a surrogate parameter. Then, predicted values were converted into normalization factors for WQI min calculation. In the x-axis label, WQI min (COD → BOD 5 ) indicates that biochemical oxygen demand (BOD 5 ) was used as the surrogate for the key parameter of chemical oxygen demand (COD). Black dotted lines indicate 1:1 lines. Figure S3: Relationships between objective and minimum water quality indices (WQI obj and WQI min ) from 2017 to 2018. Weights were determined using two methods; for a-c, a relative weight was assigned to each key parameter and for d-f, the percent variance explained by a given extracted factor was assigned to each key parameter. Black dotted lines and blue dashed lines indicate 1:1 lines and regression lines, respectively. Table S1: Proportions of three land-cover categories (urban, agricultural, and forested land) for urban-dominated watersheds (URB), agricultural-dominated watersheds (AGR), and forest-dominated watersheds (FOR). Table S2: Parallel analysis results comparing eigenvalues and simulated mean eigenvalues for urban-dominated (URB), agriculture-dominated (AGR), and forest-dominated (FOR) land-cover. The simulated mean eigenvalue indicates the mean eigenvalue calculated from randomly generated simulation data. Asterisks (*) indicate that the eigenvalue is higher than the corresponding simulated mean eigenvalue.