Recognition of Patterns of Benthic Diatom Assemblages within a River System to Aid Bioassessment

: Benthic algae, especially diatoms, are commonly used to assess water quality in rivers. However, algal-based assessments are challenging at the river system scale because longitudinal variation in physical habitat conditions may obscure algal responses to changes in water quality. In the present study, we surveyed benthic diatoms and environmental variables from a mountainous Chinese river system. Hierarchical clustering, discrimination analysis, and indicator species analysis were used together to explore associations between distribution patterns of diatom assemblages and water quality variables. Study sites were clustered into five groups based on their diatom community composition, with sites grouped by the sampling months. Chemical oxygen demand (COD), elevation, and total nitrogen (TN) were the most important predictors for site classification. Site groups with higher elevations had higher TN concentrations; however, COD concentrations were higher in lower elevation groups. Moreover, COD concentrations significantly differed between temporally separated groups. In total, 49 indicator species were identified for individual groups, with most taxa indicating the eutrophic condition. Additionally, we found that European diatom indices are not closely associated with water quality variables. We conclude that the identification of algal patterns and their driving forces can provide valuable information to aid bioassessment at the river system scale.


Introduction
Spatio-temporal patterns of lotic assemblages change substantially along anthropogenic stressor gradients (e.g., water quality pollution, hydrological alteration, and sediment erosion). Hence, understanding biological patterns and their driving forces can aid in the identification of the stressors affecting them [1]. For a given river system, the assessment of changes in lotic assemblage structures/patterns and functions responding to stressors, namely bioassessments, could not only indicate the causes of degradation in the biological condition, but also guide proper watershed management practices [2][3][4]. Therefore, in recent decades, lotic bioassessments have been increasingly included as foundational components of watershed management plans [5,6].
However, performing lotic bioassessments at the river system scale pose many challenges [7], especially for relatively small rivers of hundreds of kilometers in length and thousands of km 2 in watershed size. When bioassessments are performed at large scales, e.g., at regional or national scales [8,9], although both natural environmental factors (e.g., temperature, precipitation, geology, and soils) and anthropogenic stressors change significantly, stressors gradients are usually strong enough to drive changes in biological patterns. Thus, significant stressor-response relationships can be developed, and major stressors affecting lotic ecosystems can be effectively detected while accounting for variation in natural environmental factors [10,11]. In contrast, when bioassessments are performed at the river system scale, associations between biological patterns and stressors are difficult to identify, because small-scale spaces may share similar features with natural environmental factors, resulting in relatively homogeneous background water chemistry [12]. Conversely, physical habitat variables, such as elevation, slope, substrate, water depth, current velocity, and riparian cover, do change considerably along stream orders [13]. Therefore, the distributional patterns of lotic assemblages in a river system respond to both longitudinal variation in physical habitat conditions and anthropogenic stressor gradients in the watershed [14][15][16][17]. Naturally, longitudinal variation in biological patterns will obscure stress-response relationships if the stressor gradients are not strong enough [18]. Besides the spatial confounding effects, the effects of anthropogenic stressors and physical habitat variables on lotic assemblages may change temporally, inducing distinctly temporal stress-response relationships [7,19].
Therefore, traditional bioassessment methods may lead to inaccurate results when they are used in river systems without additional analysis. For instance, biological indices that were initially developed to indicate ecological conditions, such as trophic or water quality status, would also be influenced by variation in the natural environment [20,21]. As for reference condition-based methods, assessments may be hard to perform within river system due to limited number of reference sites [22,23]. Reference sites may be relatively easy to find in upstream areas but are scarce or totally absent in downriver areas where there is extensive human activity. It is evident that upstream reference sites cannot represent downriver reference conditions. Under extreme conditions, human disturbances in upstream areas may also be so severe that only a few reference sites can be identified. Under such conditions, assessment accuracy becomes a knotty issue [24]. The application of alternative methods to explore associations between biological patterns and environmental variables could provide valuable information that aids in bioassessments at the river system scale; however, such attempts have been infrequent (but see [18]).
Lotic benthic algae, especially benthic diatoms, respond rapidly to environmental changes because of their nutrient requirements and their position at the base of lotic food webs. Hence, they can provide unique early warning signals of deteriorating conditions and have been extensively used in lotic assessments [25]. In the present study, we used a bottom-up strategy to study the associations between the spatio-temporal patterns of benthic diatom assemblages and water quality in a mountainous Chinese river system. We first performed hierarchical clustering to identify the spatial and temporal patterns of benthic diatom assemblages within the river system. Then, the random forest method was used to identify the contributions of the physical habitat and chemical variables to diatom patterns. Spatial differences in environmental variables were detected with nonparametric tests, and indicator taxa for diatom patterns were identified with an indicator species analysis. Meanwhile, several European diatom indices that are generally used to indicate lotic ecological conditions were also calculated. Linear models were fitted to identify potential driving forces for variation in diatom indices. It was assumed that (1) the contributions of the physical habitat and water quality variables to algal patterns in the river system could be distinguished by combining applications of several analysis methods. Therefore, these analyses may provide valuable aid to bioassessments, (2) European indices would lead to inaccurate assessments if they are used without modification. Our study demonstrates that more consideration of bioassessments within a river system is needed.

Study Area
The Chishui river (104 • 45 -106 • 51 E, 27 • 20 -28 • 50 N) is located in southwestern China [26,27]. The river originates from Zhenxiong County in northern Yunnan Province and runs through 13 counties/cities in Yunnan, Guizhou, and Sichuan province before flowing into the Yangtze River in Hejiang County. The main stream is 436.5 km long with a drainage area of 20,440 km 2 . The elevation of the watershed ranges from 200 to 1800 m, with the terrain gently sloping from the southeast towards the northwest. This area has a subtropical continental monsoon climate with hot, wet summers and cool, dry winters. The annual average air temperature varies from 11.3 • C in the headwaters to 18.1 • C in the downstream sub-basins [28]. The annual precipitation is 1027.2 mm, and approximately 60% of rainfall occurs between June and September.
Although the Chishui river is the only tributary of the upper Yangtze River, where no dam has been built on the main stream, other types of human disturbance are popular [29]. Specifically, the upper reaches of the river suffer from the effects of domestic sewage and agricultural pollution. Villages are situated near the riverbank, and crops have been planted within the riparian zones because there is limited flatland in this mountainous area. The main human disturbances in the middle reach include wine companies, coal mining enterprises, and towns. In the downriver reaches, with the increased urbanization and industrialization, urban and industrial wastewater, combined with agricultural pollution, has had a substantial influence on water quality.

Benthic Diatom Sampling and Identification and Diatom Index Calculation
To account for the considerable temporal variation in assemblage compositions, we sampled benthic algae from 40 main stream river sites in April (the base flow period) and September (the high flow period) 2016 ( Figure 1). During each sampling occasion, 15 samples were collected from the dominant substrate along a reach that was several times longer than the width of the sampling transects (approximately 50-200 m) at each site [30,31]. Generally, algae were sampled from randomly selected rocks (diameter range: 15-60 cm) with the sampling area confined using a circular lid (radius: 2.7 cm). For each rock, the surface within the lid was vigorously scrubbed using a nylon brush and rinsed three to four times with distilled water. Samples from the same site were combined into one composite sample, and the volume was recorded. For several downstream, non-wadable sites where the dominant substrate was sediment, a 60-mL syringe was used to collect the top 1 cm of soft sediment from the lid-delimited area. Sediment from the same site was combined. For wadable sites, algal samples were collected across the stream transects. In contrast, algae were sampled from shorelines at a water depth of <0.5 m at the non-wadable sites because algal growth is limited by deeper water with a higher turbidity [18]. All samples were preserved with 4% formalin in the field for further identification and enumeration.
In the laboratory, sediment samples were first rinsed with distilled water several times to separate algae from mud. After this pretreatment, diatoms were cleaned with concentrated sulfuric acid and concentrated nitric acid and mounted on a microscope slide with neutral balsam (Hushi Brand, Sinopharm Chemical Reagent Co., Ltd., Shanghai, China, refractive index: 1.52) [32]. Diatoms were enumerated and identified using a compound microscope (Olympus CX21: Olympus Optical Co., Tokyo, Japan) at 1000X. At least 600 valves were identified (most to the species level) following the taxonomic references [33][34][35][36][37][38]. Basionyms were examined according to the Catalogue of Diatom Names (Online Version, updated 19 September 2011, http://researcharchive.calacademy.org/ research/diatoms/names/index.asp), and only the currently used names were retained. The number of each taxon presenting at each site was counted to determine the relative abundances. Algal samples and diatom slides were deposited in the lab for future reference.
Four European diatom indices that are widely used in lotic assessments were calculated with OMNIDIA 7 software V 4.2 [39]: the Sladecek index (SLA), the Specific Pollution sensitivity Index (IPS: Indice de Polluo-sensibilité Spécifi que), the Biological Diatom Index (IBD: Indice Biologique Diatomées), and the Trophic Diatom Index (TDI). They were calculated based on the Zelinka-Marvan weighted average equation, in which the relative abundance of each diatom taxa in the community is combined with their ecological valency (sensitivity or optima) and indicator weights (tolerance) [40]. The indicator values (i.e., the sensitivity/optima values or the weights/tolerance values) for each taxon were determined a priori. The four indices differed in several respects, including the number of taxa used to develop the indices, the taxonomic resolution, the indicator values attributed to each taxon, and the information provided. Specifically, SLA is a saprobic index that measures the degree of organic enrichment in waters; IPS evaluates the general water quality by integrating organic pollution, salinity, and eutrophication; IBD was developed to indicate how much the level of organic and global pollution in a water body; and TDI is commonly used to assess the trophic status of rivers and streams [20,40]. weighted average equation, in which the relative abundance of each diatom taxa in the community is combined with their ecological valency (sensitivity or optima) and indicator weights (tolerance) [40]. The indicator values (i.e., the sensitivity/optima values or the weights/tolerance values) for each taxon were determined a priori. The four indices differed in several respects, including the number of taxa used to develop the indices, the taxonomic resolution, the indicator values attributed to each taxon, and the information provided. Specifically, SLA is a saprobic index that measures the degree of organic enrichment in waters; IPS evaluates the general water quality by integrating organic pollution, salinity, and eutrophication; IBD was developed to indicate how much the level of organic and global pollution in a water body; and TDI is commonly used to assess the trophic status of rivers and streams [20,40].

Environmental Variables
For each site, the location (latitude, longitude, and elevation) and slope of the river reach were recorded with a GPS system and a gradiometer, respectively. The current velocity was measured with a current flow meter (FP211, Global Water). pH and conductivity were measured with a portable Yellow Springs Instrument (YSI) meter (Model 33, YSI, Incorporated, Yellow Springs, Greene County, OH, USA). A 600 mL stream water sample was collected and preserved in acidic conditions until chemical analysis. In the laboratory, concentrations of Ammoniacal nitrogen (NH4-N) total nitrogen (TN), Total phosphorus (TP), and Chemical oxygen demand (COD) were measured following the standard methods recommended by the national water monitoring protocol [41].

Statistical Analysis
We first clustered sites with the species composition of diatom assemblages using agglomerative hierarchical clustering based on the Bray-Curtis dissimilarity with a flexible linkage of β = −0.25.

Environmental Variables
For each site, the location (latitude, longitude, and elevation) and slope of the river reach were recorded with a GPS system and a gradiometer, respectively. The current velocity was measured with a current flow meter (FP211, Global Water). pH and conductivity were measured with a portable Yellow Springs Instrument (YSI) meter (Model 33, YSI, Incorporated, Yellow Springs, Greene County, OH, USA). A 600 mL stream water sample was collected and preserved in acidic conditions until chemical analysis. In the laboratory, concentrations of Ammoniacal nitrogen (NH 4 -N) total nitrogen (TN), Total phosphorus (TP), and Chemical oxygen demand (COD) were measured following the standard methods recommended by the national water monitoring protocol [41].

Statistical Analysis
We first clustered sites with the species composition of diatom assemblages using agglomerative hierarchical clustering based on the Bray-Curtis dissimilarity with a flexible linkage of β = −0.25. Bray-Curtis dissimilarity conserves the object space and is commonly used in cluster preprocessing, and a flexible linkage of β = −0.25 has shown good performance in clustering [42]. Prior to clustering, taxa appearing in ≤5% or ≥95% of the sites were excluded because they contributed little to the classification [43], and relative abundance data were arcsine-square root transformed to down weight the influence of the most abundant taxa.
After the optimum number of clusters had been determined, the random forest method was conducted to develop a discrimination model for site classification. Random forest is a boosted tree method that prevents overfitting in classification and regression trees, and releases the strict requirements on data independence and normality of traditional linear methods [44]. This method has been proven to perform better than linear discriminant analysis and generalized dissimilarity modeling in classification analysis [45]. The diatom group for each site that was assigned by hierarchical clustering was used as the response variable, and nine physical and chemical variables that are important to algal distributional patterns were used as predictive variables: COD, NH 4 -N, TN, TP, pH, conductivity, elevation, slope, and velocity. The classification performance was evaluated by the out-of-bag (OOB) error rate. The model was run 100 times, and the average error rate was calculated to offset the randomness in data selection. We successively removed each predictor that made the least contribution from the model, and at each step, the OOB error rate of the new model was compared to that of the model with previous predictors. The process stopped when the new model had a higher error rate than the old one, at which point the remaining predictors were considered to be the most important contributors to the site classification [44]. In addition, Kruskal-Wallis tests were performed to examine whether there were significant differences in these predictors among the site groups. The Mann-Whitney U test was adopted as a post-hoc test for pairwise comparison between site groups.
We further used the indicator species analysis (ISA) to identify which diatom taxa were important to individual site groups. ISA calculates an indicator value (IndVal) for each taxon by combing the relative abundances and frequencies within each classification group [46]. The indicator taxa for each site group are thereafter selected based on their specificity (the probability that a taxon occurs in the target site group) and fidelity (the probability of finding the species in sites belonging to the target site group) to the group. The IndVal ranges from 0 to 1, with 1 representing a perfect indication.
The significance of IndVal was tested with the Monte Carlo randomization technique. Taxa with IndVal ≥ 0.5 and p < 0.05 in permutation tests (1000 times) were considered to be indicators for their respective groups.
A stepwise multiple linear regression analysis was performed to identify the subset of environmental variables that best explained the observed variation in diatom indices. For the initial models, all nine physical and chemical variables were included as predictors, with one of the diatom indices as the response variable. The Akaike Information Criterion (AIC) was adopted to evaluate the improvement of the model when adding or dropping a predictor. After the final model had been fitted, quantile-quantile plots and plots of residuals versus fitted values were checked to determine whether the assumptions of linear regression had been fulfilled. Predictors with extreme values were transformed and the collinearity between predictors was checked before analysis. Regression models were fitted separately for the data from April and September.
All statistical analyses were undertaken in R 3.3.3 (R Core Team 2016). Agglomerative hierarchical clustering, random forest, and indicator species analysis were performed with the "Cluster", "randomForest", and "indicspecies" packages, respectively. Stepwise multiple linear regression was implemented with the "lm" and "step" functions from the base statistical package "stats".

Site Classification
A total of 183 diatom taxa were observed in April (120) and September (107)    for September. Generally, more taxa were observed at the middle and downriver sites than at the upstream sites.
The study sites were classified based on the community composition of diatom assemblages ( Figure 2). The Davies-Bouldin clustering index, a commonly used measurement to determine the optimal number of the cluster, was minimized at 5 and 6. The 5-cluster structure was then chosen for its conciseness. The agglomerative coefficient of the 5-cluster structure, which describes the strength of the clustering structure obtained by a group's average linkage, was 0.82, indicating a reasonable clustering. The clustering results showed clear temporal variation (Figure 2). All April sites were assigned to group 1 (G1:19 sites) or group 2 (G2: 21 sites). G1 was mainly composed of upstream and middle-river sites, while G2 was composed of most downriver sites. September sites were clustered to the other three groups (G3-G5). Most upstream sites in September were assigned to G5 (14 sites), with middle-river sites to G3 (18 sites) and downriver sites to G4 (8 sites). Sites allocated to G1 in April were mainly assigned to G3 (36.8%) and G5 (57.9%) in September. In contrast, G2 sites were assigned to G3 (47.6%) and G4 (38.1%) in September.

Differences in Environmental Variables among the Site Groups
All environmental variables, except slope and velocity, showed significant differences among site groups (Kruskal-Wallis test, p < 0.01, Figure 3). Significant intergroup differences and temporal differences were revealed by the Mann-Whitney U tests. The COD concentrations in the April groups (G1 and G2 medians: 1.7 and 3.2 mg/dm 3 , respectively) were lower than those in the September groups (medians for G3-G5: 9.59, 9.04, and 7.31 mg/dm 3 , respectively). Besides this, the COD concentrations in G1 were lower than those in G2. The elevations of the G1 (median: 735 m) and G5 (median: 999.6 m) sites were higher than those of the other three groups of sites. Although G2, G3,

Differences in Environmental Variables among the Site Groups
All environmental variables, except slope and velocity, showed significant differences among site groups (Kruskal-Wallis test, p < 0.01, Figure 3). Significant intergroup differences and temporal differences were revealed by the Mann-Whitney U tests. The COD concentrations in the April groups (G1 and G2 medians: 1.7 and 3.2 mg/dm 3 , respectively) were lower than those in the September groups (medians for G3-G5: 9.59, 9.04, and 7.31 mg/dm 3 , respectively). Besides this, the COD concentrations in G1 were lower than those in G2. The elevations of the G1 (median: 735 m) and G5 (median: 999.6 m) sites were higher than those of the other three groups of sites. Although G2, G3, and G4 all had relatively low elevations, they differed from one other. The G4 sites had the lowest elevations (median: 224.2 m), with higher elevations for the G2 sites (median: 288.4 m), and even higher for the G3 sites (median: 381.8 m). The TN concentrations in G1, G3, and G5 (medians: 3.81, 3.99, and 3.95 mg/dm 3 , respectively) were higher than those in G2 and G4 (medians: 3.21 and 3.17 mg/dm 3 , respectively). Conductivity, NH 4 -N, and TP were higher in April than in September. The pH displayed opposite temporal changes. and G4 all had relatively low elevations, they differed from one other. The G4 sites had the lowest elevations (median: 224.2 m), with higher elevations for the G2 sites (median: 288.4 m), and even higher for the G3 sites (median: 381.8 m). The TN concentrations in G1, G3, and G5 (medians: 3.81, 3.99, and 3.95 mg/dm 3 , respectively) were higher than those in G2 and G4 (medians: 3.21 and 3.17 mg/dm 3 , respectively). Conductivity, NH4-N, and TP were higher in April than in September. The pH displayed opposite temporal changes. The full random forest model that included all nine environmental variables and the sub-model with COD, elevation, and TN had similar OOB estimates of classification error rate (21.36% for the full model versus 20.94% for the sub-model). COD was the most important determinant of site classification (Figure 4), followed by elevation and TN.

Indicator Taxa for Site Groups
In total, 49 indicator taxa were identified for the five site groups (Table 1). There were two indicator taxa from different genera for G1. G2 had the highest number of indicator taxa (22) from 16 genera. Sixteen indicator taxa were identified for G3, with most taxa from the Navicula sensu lato (7) and Cymbella (4) genera. G4 had eight indicator taxa, five from the motile genera Nitzschia and Navicula [47]. Cocconeis placentula was the only indicator for G5. Overall, approximately 70% of all indicator taxa came from five genera: Navicula (10), Nitzschia (6), Cymbella (8), Achnanthes sensu lato (4), and Amphora sensu lato (5). Taxa from the motile genera Nitzschia or Navicula mainly occurred in G3 and G4.

Indicator Taxa for Site Groups
In total, 49 indicator taxa were identified for the five site groups (Table 1). There were two indicator taxa from different genera for G1. G2 had the highest number of indicator taxa (22) from 16 genera. Sixteen indicator taxa were identified for G3, with most taxa from the Navicula sensu lato (7) and Cymbella (4) genera. G4 had eight indicator taxa, five from the motile genera Nitzschia and Navicula [47]. Cocconeis placentula was the only indicator for G5. Overall, approximately 70% of all indicator taxa came from five genera: Navicula (10), Nitzschia (6), Cymbella (8), Achnanthes sensu lato (4), and Amphora sensu lato (5). Taxa from the motile genera Nitzschia or Navicula mainly occurred in G3 and G4. Table 1. Indicator values (IndVal) of selected diatom taxa for individual site groups. Site groups were identified by a cluster analysis. The numbers in brackets following site group names are the numbers of indicator taxa. A refers to species specificity to a site group, which is the probability that a taxon occurs in a particular target site group; B refers to species fidelity, which is the probability of finding the species at the sites belonging to the target site group.

Environmental Variables Best Explaining the Variation in European Diatom Indices
More than 80% of observed taxa were used to calculate the TDI (April: 81.5%; September: 84.3%) and IPS (April: 80.6%; September: 83.5%). By comparison, approximately 65% and 50% of taxa were included for calculating the IBD (April: 63.4%; September: 64.9%) and the SLA (April: 51.0%; September: 52.9%), respectively. The stepwise multiple linear regression analysis revealed distinct relationships between the four European diatom indices and environmental variables ( Table 2). In April, approximately 20% of the variation in SLA was explained by NH 4 -N and current velocity. In comparison, TP, pH, elevation, and current velocity explained more than 40% of the variation in this index in September. Elevation was the only important environmental variable for the TDI in both months, with a relatively stronger relationship in April. Variation in the IBD was also explained only by elevation in April. Variation in the IPS in both months and the IBD in September was not associated with any environmental variable.

Associations between Algal Patterns and Water Quality Gradients
We found that benthic diatom assemblages in the Chishui river system could be classified into two or three groups depending on the sampling month. In some cases, sites from different longitudinal locations were assigned to the same group. For instance, both upstream and downriver sites belonged to group 1, and group 3 was composed of sites from up-, middle-, and downriver sites. This implies that diatom patterns in this river system are not solely determined by the longitudinal location as observed in some other rivers (Figure 2) [18]. We further found that COD, elevation, and TN were the most important environmental variables affecting the spatio-temporal patterns of diatom assemblages (Figure 3), indicating that both natural habitat and water quality variables are important to diatom patterns in this river system.
The COD was mainly produced by the hundreds of wine enterprises that are located in the middle reach of the river system [48]. These enterprises are distributed near the river for convenience of extracting water. The river's water quality has been severely degraded by wastewater emissions from these enterprises under the current management standards [49]. Organic pollutants that are discharged into rivers can, through oxidation, be converted into nutrients that support algal growth. Other studies have also found that COD plays an important role in diatom patterns in highly disturbed areas [50,51]. Similar to most other studies [52,53], TN was shown to have important effects on diatom patterns in the Chishui river. TN in this river was mainly generated from domestic and agricultural pollution. Contrary to COD, TN concentrations were higher at the upstream sites. This finding is in accordance with the fact that stream banks in upstream reaches of this mountainous watershed are generally occupied by buildings and farmlands which induce severe domestic and agricultural pollution. Elevation, an important natural variable that shapes biological patterns globally [54], also had important effects on diatom patterns in the Chishui river. This variable is a proxy factor for temperature, energy, precipitation, and other variables that are essential to algal distribution. In the present study, elevation decreased longitudinally from the headwaters to the river's mouth. Hence, diatom assemblages also changed significantly between high-and low-elevation sites, irrespective of the sampling month (G1 versus G2 in April and G5 versus G4 in September).
Diatom assemblages in April and September were separated by the first level of clustering (Figure 2), indicating that temporal changes in diatom patterns were more significant than those in spatial variation. The results of the nonparametric tests imply that temporal changes in diatom patterns were mainly associated with changes in the COD concentration. COD concentrations in September were significantly higher than that in April, which could be attributed to increased terrestrial inputs of organic pollutants through rainfall events.
Indicator taxa provided further evidence for associations between diatom patterns and water quality. Nitzschia acicularis, a eutrophic and α-mesosaprobous species [55], was indicative to G1, implying a potential water quality problem in this group. Taxa indicating the eutrophic condition, including Amphora pediculus, Cocconeis pediculus, Cyclotella meneghiniana, Cymbella tumida, Cymbella turgidula, Diatoma moniliformis, Encyonema lacustre, Gomphonema clavatum, Gyrosigma scalproides, Halamphora coffeaeformis, Halamphora montana, Luticola mutica, Luticola stigma, Melosira varians, Navicula antonii, Navicula symmetrica, Nitzschia dissipata, Nitzschia palea, Nitzschia pseudofonticola, and Planothidium lanceolatum, were more representative of the low-elevation groups (G2, G3 and G4). The autecology of most indicator taxa suggests that water quality degradation is extensive in this watershed, and the diatoms were an effective indicator of such type of disturbance. Besides the water quality, the indicator taxa also revealed that there is likely a sediment gradient in the Chishui river system that has increased the propitiation of motile genus Navicula and Nitzschia in low-elevation groups 3 and 4. Smucker et al. [56] demonstrated that an increase in these motile taxa also indicates an increase in human disturbance represented as the watershed's impervious cover percentage.

Potential Applications
Our analysis is helpful for a more accurate understanding of ecological conditions the Chishui river system. We found that the variation in European diatom indices was only partly associated (for the SLA), or unassociated (for the IPS, IBD, and TDI), with water quality variables. Other researchers have obtained similar results and argued that diatom indices derived from the region under study would perform better than that of foreign indices [20,57]. We did not develop diatom indices for the Chishui river because only the main river stream was surveyed. Nevertheless, it would be worthwhile to attempt this if funding for a more extensive survey becomes available in the future. Using a reference condition-based assessment, Chi et al. [29] developed a macroinvertebrate-based multi-metric index for the Chishui river. Compared with other studies [8,58], they set relatively low-quality criteria (TP < 0.20 mg/L, NH 4 -N < 1 mg/L, and COD < 6 mg/L) for screening reference sites. They found that <10% of sites were in "excellent" condition, with some headwater and upstream sites in "fair" or "poor" condition. Their results coincide with our finding that upstream sites were also polluted. As pollution was extensive in the watershed, we did not adopt a reference condition-based bioassessment. Alternatively, we classified sites directly with community compositions of benthic diatoms and identified the environmental variables driving algal patterns. Although we could not measure specific ecological conditions for individual sites with this approach, we did find that an environmental stressor, water quality degradation, affected the river ecosystem.
Thus, the combination of our findings with those of Chi et al. [29] allows a more comprehensive and accurate understanding of this river system's ecological conditions. Diatom patterns, together with their associations with environmental variables in the Chishui river system may be used to guide management practices. First, the upper reaches of this river ecosystem were mainly influenced by nitrogen pollution generated by domestic and agricultural activities on the river banks. Riparian vegetation zones are important buffering and filtering zones for terrestrial inputs of nutrients [59]; therefore, they urgently need to be recovered to protect the water quality in this area. As for the middle-and downriver reaches where the COD loadings were high due to the dense distribution of wine enterprises, more strict management might be considered. For instance, to protect water quality to sustain the high quality of the wine produced here, both the number of wine enterprises and the wastewater emission standards could be revised [48]. Secondly, the COD and TN loadings were shown to be higher in September than in April. This implies that more terrestrial contaminants were transferred to the river through rainfall. Therefore, more effective wastewater collection systems could be developed or updated for pollution control during the rainy season.
In conclusion, both physical habitat variable of elevation and water quality variables of COD and TN were important to the spatio-temporal patterns of diatom assemblages in the Chishui river system. Combining the application of hierarchical clustering, discriminant analysis, indicator species analysis, and nonparametric tests was effective in detecting the associations between algal patterns and water quality. Conversely, the use of European diatom indices would lead to a biased assessment. Overall, our study demonstrates that exploring associations between lotic biological patterns and environmental variables can provide valuable information to aid in bioassessments, especially for river systems where there is substantial natural variation in habitat conditions or where there has been extensive human disturbance.
Author Contributions: T.T. and Z.X. conceived and designed the research; S.G., T.T., and Z.X. performed the fieldwork; S.G. conducted the taxonomic identification and enumeration of diatoms; S.M. analyzed the data and wrote the manuscript; T.T., H.D., and Z.X. contributed to the discussion; and all authors reviewed the manuscript before submission.