Developing Indicators of Nutrient Pollution in Streams Using 16S rRNA Gene Metabarcoding of Periphyton-Associated Bacteria

Indicators based on nutrient-biota relationships in streams can inform water quality restoration and protection programs. Bacterial assemblages could be particularly useful indicators of nutrient effects because they are species-rich, important contributors to ecosystem processes in streams, and responsive to rapidly changing conditions. Here, we sampled 25 streams weekly (12–14 times each) and used 16S rRNA gene metabarcoding of periphyton-associated bacteria to quantify the effects of total phosphorus (TP) and total nitrogen (TN). Threshold indicator taxa analysis identified assemblage-level changes and amplicon sequence variants (ASVs) that increased or decreased with increasing TP and TN concentrations (i.e., low P, high P, low N, and high N ASVs). Boosted regression trees confirmed that relative abundances of gene sequence reads for these four indicator groups were associated with nutrient concentrations. Gradient forest analysis complemented these results by using multiple predictors and random forest models for each ASV to identify portions of TP and TN gradients at which the greatest changes in assemblage structure occurred. Synthesized statistical results showed bacterial assemblage structure began changing at 24 μg TP/L with the greatest changes occurring from 110 to 195 μg/L. Changes in the bacterial assemblages associated with TN gradually occurred from 275 to 855 μg/L. Taxonomic and phylogenetic analyses showed that low nutrient ASVs were commonly Firmicutes, Verrucomicrobiota, Flavobacteriales, and Caulobacterales, Pseudomonadales, and Rhodobacterales of Proteobacteria, whereas other groups, such as Chitinophagales of Bacteroidota, and Burkholderiales, Rhizobiales, Sphingomonadales, and Steroidobacterales of Proteobacteria comprised the high nutrient ASVs. Overall, the responses of bacterial ASV indicators in this study highlight the utility of metabarcoding periphyton-associated bacteria for quantifying biotic responses to nutrient inputs in streams.


Introduction
Nutrient pollution of freshwater ecosystems is a persistent and growing concern around the world [1][2][3], often being the most common stressor affecting streams [4][5][6][7]. Inputs of nitrogen or phosphorus create environmental stress by causing changes in the primary producer communities, reductions in biodiversity, alterations to food webs, hypoxia, and harmful algal blooms [8][9][10][11][12]. These negative impacts are expected to continue to increase as human populations grow, requiring more food production and watershed development, leading to further stress and alteration of freshwater and coastal ecosystems [13][14][15]. The development of methods for measuring ecosystem responses to nutrient pollution is therefore crucial for understanding how to mitigate or prevent ecological damage. Decision makers, regulatory agencies, and other end users need the best tools possible to measure, address, and alleviate the negative effects of nutrient pollution.
In aquatic environments, the benthic bacterial assemblage is part of the periphyton (or stream biofilm community) that is critical to biogeochemical processing. Stream biogeochemistry is driven by water quality and abiotic-biotic interactions within the periphyton matrix [16,17]. Bacteria are biogeochemical workhorses, responsible for important processes such as nitrification, denitrification, metabolism of pollutants, and producing hydrolytic enzymes that promote the turnover of organic carbon and nutrients [18][19][20][21]. These bacteria-mediated processes, coupled with their abilities to quickly replicate and increase biomass in response to changes in environmental conditions [22,23], offer a basal response to changing nutrient conditions in streams [24]. As a result, the responses of the bacterial component of stream periphyton to nitrogen and phosphorus concentrations can be used as indicators that provide context for understanding nutrient effects on streams [25,26]. Tools that help elucidate periphytic microbial responses to nutrient pollution therefore can provide valuable information for understanding and managing the effects of nitrogen and phosphorus pollution.
Aquatic bacterial assemblages, however, can be challenging to characterize. Many microbes are impossible to culture and study in the laboratory, including the bulk of taxa that would be of interest in ecological and environmental studies [27,28]. For this reason, molecular genetic techniques like DNA metabarcoding have been employed to sample and analyze a variety of microbial communities [17,29]. DNA metabarcoding has become a valuable tool for gathering extensive amounts of microbial community data that would have been very costly or nearly impossible a few years ago [30][31][32][33][34]. Recent studies in freshwater environments have shown bacterial assemblage responses to nutrients [26,35,36], flow and fine sediment [37], temperature and pH [36], and conductivity [35].
In this study of streams in southwestern Ohio, USA, we applied DNA metabarcoding to the bacterial assemblages associated with periphyton. Using samples collected weekly from sites representing a broad range of nitrogen and phosphorus conditions within a large watershed, we analyzed bacterial taxa, defined as amplicon sequence variants (ASVs), and the associated water chemistry data with threshold indicator taxa analysis (TITAN; [38]), boosted regression trees (BRT; [39,40]), and gradient forests (GF; [41]) to determine how bacterial assemblages and individual taxa respond to different nutrient concentrations. We also used phylogenetic analysis to show the distribution of high and low nutrient indicator taxa among the major bacterial groups.

Site Selection and Sampling
Twenty-five sites in second-to third-order wadeable streams within the East Fork of the Little Miami River watershed of southwestern Ohio (USA) (Figure 1, Supplementary Table S1) were chosen based on historical monitoring data and site characterizations to comprise a gradient of both nitrogen and phosphorus concentrations across sites while minimizing possible confounding effects of non-nutrient factors. The watershed experiences a temperate seasonal climate and the 25 stream sites covered a range of watershed sizes (15.8-82.3 km 2 , median 37.6 km 2 , mean 40.3 km 2 ± 16.4 km 2 ), percent agriculture coverage (0-88%, median 53%, mean 48% ± 29%), percent forest coverage (9-59%, median 32%, mean 33% ± 14%), and percent urban coverage (0-69%, median 8%, mean 19% ± 20%).
Sites were sampled weekly from early July through September 2016 with another 1-2 samples taken in October. Each site had 13-14 periphyton samples (n = 342 among sites) and 10-12 nutrient samples collected (TP n = 281, TN n = 280 among sites). At each site, water was collected in 1-L acid-washed polypropylene bottles for nutrient analyses and periphyton was scraped and composited from five rocks from five equally spaced transects over a 75-m stream reach collected with a firm-bristled brush on a cordless drill using a 6.7 cm 2 plastic guide. Periphyton samples were stored frozen at −80 °C until thawed for DNA analysis. Water samples for total phosphorus (TP) and total nitrogen (TN) were stored in the dark at 4 °C until analyzed within 24 h or were frozen at −20 °C for analysis within 14 days using a QuikChem 8500 nutrient autoanalyzer system (Lachat Instruments, Milwaukee, WI, USA). Acid persulfate wet digestions were conducted prior to using the molybdate and antimony potassium tartrate reaction and ascorbic acid reduction method to measure TP [42,43]. Total nitrogen was measured using an alkaline wet oxidation persulfate method prior to cadmium reduction [44,45].

Bioinformatics
QIIME2 was used for bioinformatic metagenomics analysis (version 2020; https:// qiime2.org, accessed on 3 Jun 2020) [49]. Sample metadata were created by following the template for the project. Raw reads were imported into QIIME2 followed the "Casava 1.8 paired-end demultiplexed fastq" format. Quality filtration, denoising, merging of paired reads, and chimera removal were carried out by using the DADA2 pipeline [50] implemented in the QIIME2 environment. Filtration parameters were set as: 25 bases trimming at the left side of forward and reverse reads, and truncation of forward and reverse reads up to 250 bases. Raw DNA reads were not clustered into operational taxonomic units (OTUs), and instead kept as amplicon sequence variants (ASVs) to avoid reducing the diversity of potential indicator taxa and because OTUs from clustering are difficult to compare with other data sets. ASVs with only a single DNA sequence, however, were excluded. Although ASVs for this genetic locus could be considered analogous to species, we maintain a more conservative stance and only refer to them as ASVs. Taxonomic assignments were made with SILVA (Supplementary File) [51].

Statistical Analyses
Our objectives were to characterize bacterial assemblage-environment relationships and to identify indicator taxa responsive to TP and TN concentrations. Quantifying and summarizing these assemblage and taxa changes can help identify possible management targets for TP or TN concentrations. For each statistical analysis, we used data from all sampling events for each site because this incorporated the variability in nutrient concentrations over time and provides a fuller representation of nutrient conditions at sites, likely leading to more robust indicator development. All statistical analyses used relative abundances of gene sequence reads for each ASV in a sample.
We used nonmetric multidimensional scaling (NMDS) to visualize changes in bacterial assemblage structure among samples and Spearman correlations to examine relationships of axis scores with TN, TP, watershed land cover, and bacterial metrics (see below). For multivariate statistics, we originally included taxa with >1% relative abundance in > 1% of samples, which reduces the effects of noise introduced by including many rare taxa [52] or DNA sequencing errors such as tag jumping, and is a reasonable and conservative criterion [52][53][54][55][56]. This criterion, however, left us with using <60% of the total gene sequence reads among all samples. So, to compromise between excluding a large proportion of assemblages and including an excessive number of rare ASVs, we included ASVs in the NMDS that first were observed in >5 samples and then those that comprised 75% of the total gene sequence reads when ordered by decreasing average relative abundance (429 of 18,406 ASVs). The requirement for observations in >5 samples also meets a recommended criterion for inclusion in threshold indicator taxa analysis (TITAN). For the NMDS, relative abundances of ASV gene sequence reads were square-root transformed, the Bray-Curtis dissimilarity coefficient was used, and the ordination was rotated to maximize variation along the first axis. We used the vegan package and the metaMDS function [57] in R v. 4.0.3 [58] to conduct the NMDS. Pilgrim et al. Page 5 Water (Basel). Author manuscript; available in PMC 2023 July 30.
We used TITAN to characterize assemblage level changes and to identify ASVs that either increased or decreased with increasing TP or TN concentrations [38]. TITAN uses bootstrapping and indicator species analysis to identify the point along an environmental gradient at which each ASV has the greatest change in its frequency and relative abundance. Response magnitudes are standardized to z-scores to facilitate cross-taxa comparisons.
Bootstrapping also identifies ASVs with consistently strong responses that increase or decrease in >95% of bootstrapped replicates. For ease of communication, we subsequently refer to ASVs that decrease and ASVs that increase with greater TP or TN concentrations as low P or N taxa and high P or N taxa, respectively. The distribution of bootstrapped ASV change points and sum z-scores of all indicator taxa at each observed TP or TN concentration (i.e., partition) shows how taxa and overall assemblages change along the TP or TN gradient [59]. We used R v. 3.4.3 [56] and the 'TITAN2' package [60] to conduct TITAN for TP and TN gradients with 1000 bootstraps. We included all ASVs used in the NMDS for consistency among statistical analyses.
For each sample, we summed the relative abundances of ASVs in each indicator grouplow P, high P, low N, and high N bacteria-which could be used as metrics interpreted in a manner like that of traditional biomonitoring based on other organisms in streams (e.g., [61][62][63]). Spearman correlations were used to describe relationships of bacterial metrics with nutrients and watershed land cover. We also used boosted regression trees to examine relationships between bacterial metrics and multiple predictor variables. Boosted regression tree analysis is a machine learning method that combines results from thousands of regression trees into a model with high predictive performance [39,40]. This analysis can handle correlated predictors, determine the relative importance of predictors, model a variety of responses, and generate partial dependence plots showing the effects of each predictor while controlling for the mean effects of other variables in the model. The partial dependence plots were used to describe nutrient-bacteria relationships. We used a 10-fold cross-validation procedure, which used all data for training and validation steps, to evaluate model performance and to determine mean correlations between fitted and raw values and mean predictive performance of models reported as the percentage of null deviance explained. We used TP, TN, and conductivity as predictors and each of the four bacterial metrics as response variables. Using R v. 3.4.3 [58] and the 'dismo' package [64], we set tree complexity = 2, learning rate = 0.001, and bag fraction = 0.5, which created more than the recommended minimum of 1000 trees for each model [65].
Lastly, we used gradient forest analysis to characterize relationships between multitaxa assemblages, using ASV relative abundances, and multiple environmental predictor variables in one statistical analysis. This analysis provides complementary results to TITAN, which uses multi-taxa responses to a single predictor based on relative abundance and occurrence data, and to boosted regression trees, which model single variable responses to multiple predictors. Gradient forest analysis combines results from random forest models for each ASV with R 2 values > 0 into a measure of overall change in assemblage structure [41]. For each ASV, the split values and their importance (i.e., variation explained by the partitioning) are collated using the R package 'extendedForest' and then used to compute multi-species and assemblage change functions along each environmental gradient and to produce visualizations of results with the R package 'gradientForest' [41]. Split density Pilgrim et al. Page 6 Water (Basel). Author manuscript; available in PMC 2023 July 30. plots, which incorporate the weighted importance of ASV splits and are standardized as a ratio to the density of data, show where the greatest rates of assemblage change occur along environmental gradients (ratios >1). We interpreted peaks as the predictor values at which greatest change occurred, and the increase and decrease in splits surrounding these peaks as denoting unique regions of change along the gradient when ratios were > 1. Random permutations are used to determine the conditional importance of each predictor variable by measuring subsequent degradation in model performance.

Phylogenetic Analysis
Sequence data for the 429 ASVs used in the indicator analyses (see above) were aligned with Clustal W using default settings in MEGA-X [66]. To help assess the potential relatedness of ASVs marked as nutrient indicators, a maximum likelihood (ML) tree was generated in MEGA-X using the generalized time-reversible model with gamma-distributed rates and invariant sites (GTR + I + G). This more complex model of evolution was chosen for the ML analysis given that the ASVs were spread across multiple bacterial phyla. The ML tree was drawn and annotated in iTOL v5 [67].

Descriptive Summary of Data
TP values for water collected at the stream sites ranged from 18 to 889 μg/L (N = 280, median 171 μg/L, mean ± standard deviation = 220 ± 179 μg/L). TN values for water collected at the stream sites ranged from 76 to 6560 μg/L (N = 281, median 621 μg/L, mean ± standard deviation = 778 ± 615 μg/L). The gradients of TP and TN concentrations were continuous and well represented with no gaps or overrepresentation of certain values (see [46] for further details). Historical conductivity values for water collected at the stream sites ranged from 373 to 789 μS/cm (N = 25, median 544 μS/cm, mean ± standard deviation = 561 ± 112 μS/cm). Nonmetric multidimensional scaling showed that changes in bacterial assemblage structure among samples were associated with percent forest and percent agriculture in watersheds and with TP and TN concentrations ( Figure 2). More forested land cover, less agriculture, and lower TP and TN concentrations were associated with decreasing axis 1 scores and increasing axis 2 scores. Increasing relative abundances of low P bacteria and decreasing relative abundances of high P and high N bacteria (see TITAN results below) were associated with decreasing axis 1 scores and increasing axis 2 scores. Relative abundances of low N bacteria increased with decreasing axis 1 scores.

Bacterial Indicators and Relationships with TP and TN
Of the 429 ASVs used in TITAN, 105 were associated with lower TP concentrations (low P bacteria) and 138 were associated with higher TP concentrations (high P bacteria) among sites (Supplementary Table S2). Bootstrapped assemblage change points indicated that the greatest changes in low P bacteria occurred between 110 and 152 μg TP/L, with an overall change point at 136 μg TP/L ( Figure 3a, Table 1). Of the 105 low P ASVs, most had change points within a narrow range of TP concentrations when compared to those of high P ASVs (Figure 3b, Supplementary Table S2). Bootstrapped assemblage change points indicated that the greatest changes in high P bacteria occurred between 137 and 243 μg TP/L, with a change point at 183 μg TP/L (Figure 3a, Supplementary Table S1). Of the 138 high P ASVs, most had broad distributions of bootstrapped change points when compared to those of low P ASVs, indicating that these high P ASVs had gradual changes in their relative abundances and occurrences across a wide range of TP concentrations (Figure 3b, Supplementary Table  S2).
Of the same 429 ASVs used in TITAN, 63 were associated with lower TN concentrations (low N bacteria) and 74 were associated with higher TN concentrations (high N bacteria) among sites (Supplementary Table S3). Bootstrapped assemblage change points indicated that the greatest changes in low N bacteria occurred between 462 and 695 μg TN/L, with an overall change point at 467 μg TN/L ( Figure 3c, Table 2). Bootstrapped assemblage change points indicated that the greatest changes in high N bacteria occurred between 665 and 812 μg TN/L, with an overall change point at 772 μg TN/L ( Figure 3c, Table 2). In general, most low N and high N ASV change points occurred within narrow ranges of the total TN gradient, but a greater proportion of high N ASVs than low N ASVs had broad bootstrapped distributions of change points, indicating that these high N ASVs had more gradual changes in their relative abundances and occurrences than did low N ASVs as TN concentrations increased (Figure 3d, Supplementary Table S3).
Relative abundances of low P and low N ASVs decreased and those of high P and high N ASVs increased as watershed percent agriculture and nutrient concentrations increased (Table 3). Boosted regression tree models explained 54-67% of the deviance in bacterial metrics, and the cross-validated deviance explained ranged from 45 to 56% (Table 4), showing that the models performed similarly when predicting withheld data. TP was the most important predictor for low P ASVs, high P ASVs, but also high N ASVs, while TN was the most important predictor of only low N ASVs. Partial dependence plots identified ranges of TP and TN gradients within which large changes in bacterial metrics occurred ( Figure 4). Relative abundances of low P bacteria had a small decrease from 42 to 61 μg TP/L followed by their largest decrease from 108 to 184 μg TP/L and smaller decreases from 241 to 311, 352 to 378, and 494 to 509 μg TP /L. High P bacteria had a small increase in relative abundance from 36 to 82 μg TP/L followed by their largest increase from 94 to 184 μg TP/L and another small increase from 287 to 356 μg TP/L. Relative abundances of low N bacteria had a large decrease from 271 to 459 μg TN/L followed by a smaller decrease from 459 to 859 μg TN/L. High N bacteria gradually increased in relative abundance from 279 to 756 μg TN/L. Gradient forest analysis included 278 ASVs with R 2 values > 0 (maximum = 0.57, mean = 0.13, median = 0.10), including 61 with R 2 > 0.20 (Supplementary Table S4). Peaks in the standardized split density plot showed that the greatest changes in bacterial assemblage structure occurred between 24 and 75, 75 and 152, and 250 and 325 μg TP/L, with smaller changes occurring between 174 and 217 and 325 and 386 μg TP/L ( Figure 5). For TN, a gradual change in bacterial assemblage structure occurred between 193 and 928 μg/L, with a small second response around 1458 μg/L ( Figure 5).

Summary of Changes in Bacterial Assemblages
Complementary results from these statistical analyses showed that bacterial assemblages were strongly associated with TP and TN gradients. Results were collated to identify TP and TN concentrations at which notable changes in bacterial assemblages occurred ( Figure  6, Tables 1 and 2). In general, changes along TP and TN gradients were marked by shifts from dominance by low nutrient ASVs to high nutrient ASVs with major points of overall community change being highlighted by TITAN change points, steep portions of partial dependence plots, and peaks in gradient forest split density plots. Bacterial assemblages began having small changes from 24 to 82 μg TP/L with midpoints of multiple responses occurring around 50 μg TP/L. This initial change was followed by larger changes, including continued decreases in relative abundances of low P taxa, increases in those of high P taxa, TITAN change points, and additional large changes in gradient forest analysis between 110 and 195 μg TP/L. The final major changes in bacterial assemblages occurred between 275 and 365 μg TP/L, after which high P taxa dominated. The body of statistical evidence suggests that the bacterial assemblage structure experienced large, gradual changes with initial large decreases in relative abundances of low N ASVs from 275 to 565 μg TN/L followed by the greatest increase in relative abundances of high N ASVs and bootstrapped change points for high N taxa from 565 to 855 μg TN/L.

Taxonomy and Phylogenetic Analysis
Phylogenetic analysis of the 429 ASVs determined how the different types of indicators are spread amongst the various bacterial orders (Figure 7). Most bacterial phyla and classes were recovered as monophyletic with a few exceptions (e.g., Caulobacterales nested within Rhizobiales, and Rhodobacterales and Pseudomonadales are not monophyletic), though investigating the evolutionary history of bacteria was not the goal of the analysis. All clades with more than four ASVs had at least one ASV that was found to be an indicator of nutrient condition. Many ASVs responded to both TP and TN concentrations with 67 ASVs as indicators of both high P and high N and 45 ASVs as indicators of both low P and low N.
Within the Proteobacteria, the order Rhizobiales had the most ASVs (N = 79) and also the most ASVs found to be indicators in TITAN (N = 49) with 18 ASVs that were both indicators for high P and high N conditions, 10 ASVs that were indicators of high P, 1 ASV that was a high N indicator, 4 ASVs that were both indicators for low P and low N, 12 ASVs that were indicators of low P, and 4 ASVs that were indicators of low N (

Discussion
Our analyses showed distinct changes in bacterial assemblages associated with gradients of TP and TN concentrations, and considerable numbers of ASVs were significant responders to the phosphorus and nitrogen conditions (from TITAN, 243 of 429 for TP and 137 of 429 ASVs for TN, and from gradient forest analysis, 278 of 429 ASVs). Metabarcoding resulted in more than 18,000 bacterial ASVs, of which the 429 most frequent, comprising 75% of the total relative abundance represented over 15 different bacterial phyla. This work adds to the growing field of genetic characterization of bacterial communities for indicators of ecological or environmental conditions in streams and can inform the future applications of bacterial indicators in monitoring programs [26,36].
Our study identified many bacterial taxa that have significant relationships with either or both TP and TN. Most of these relationships show responses to TP over a concentration range of  μg/L and to TN over a range of 275-855 μg/L ( Figure 5), which is similar to the ranges found for diatom responses in this watershed  μg TP/L and 150-850 μg "TN/L), though diatom assemblages tended to respond at much lower TP concentrations [41]. In general, TITAN results suggested that tow nutrient bacterial ASVs exhibited Pilgrim et al. Page 10 Water (Basel). Author manuscript; available in PMC 2023 July 30.

EPA Author Manuscript
EPA Author Manuscript sharp declines in their relative abundances and occurrences as nutrient concentrations increased. This pattern could be due to greater numbers of high nutrient ASVs that, despite mostly having gradual increases across wider ranges of nutrient conditions, quickly and cumulatively outcompeted low nutrient ASVs as nutrient concentrations increased. While the bacterial indicator ASVs described herein likely respond to nutrient concentrations, they also could be responding to nutrient-related changes in other benthic community members [68][69][70][71].
Non-photosynthesizing bacterial groups fill biochemical roles that are very different from those of cyanobacterial or algal members of periphyton communities. For example, Sphingomonadales are known for secreting extracellular matrices as they proliferate, Actinobacteria often function as decomposers, and some Rhizobiales (e.g., Rhizobium, Bradyrhizobium, Mesorhizobium, Sinorhizobium Azorhizobium, and Allorhizobium) are major players in symbiotic N2-fixation in nature [72]. Bacteria also are important contributors to nutrient cycling within periphyton and produce extracellular enzymes that are important to phosphorus, nitrogen, and carbon uptake from organic sources [19], and their enzyme activity can be stimulated by algal productivity associated with greater amounts of nutrients [73][74][75][76]. More studies are necessary to uncover whether the bacterial indicators found herein are responding directly or indirectly to TP and TN conditions, but that does not change their potential importance as indicators for biomonitoring. Our results are informative for developing nutrient indicators in other regions, and although the data and analyses from this work are based on sites within a single watershed, the breadth of temporal coverage and the common watershed scale of monitoring programs increase the likelihood that these approaches are applicable to other, similar temperate watersheds.
DNA metabarcoding applied to biological monitoring is not without its limitations or biases.
Complicating factors for the further development of these molecular genetic tools include concerns related to sample preservation [77], DNA extraction method [78,79], PCR bias [80,81], number of gene copies [82], uniformity across bioinformatic pipelines [83], and status of genetic databases [84]. However, research advances continue to address these concerns and minimize their impacts in molecular studies. For most taxa studied with DNA metabarcoding for environmental applications, critics often discuss comparisons of genetic data to traditional taxonomic methods, but such a concern does not apply to bacteria. Isolation and taxonomic identification of these periphyton-associated bacteria would require a myriad of culture methods. It is widely accepted that the percentage of bacteria that can be cultured is minimal relative to the estimated number of total bacterial species richness (10 7 -10 9 ) [85,86]. DNA-based methods for the identification of bacterial communities still present the best opportunity to generate useful data in a timely and cost-effective manner.
Recent studies have demonstrated the potential utility of DNA metabarcoding for assessing responses of bacterial communities to anthropogenic impacts in multiple environmental contexts, and increasingly these responses are being captured in biotic indices informative to managers and decision-makers [87][88][89]. For instance, benthic bacterial communities have been shown to mirror responses of established macrofaunal indicators to nutrient enrichment associated with salmon farming [90], and 16S rRNA gene metabarcoding has been incorporated into a multi-trophic Metabarcoding Biotic Index found to be strongly correlated Pilgrim et al. Page 11 Water (Basel). Author manuscript; available in PMC 2023 July 30. with those disturbances [91]. Biotic indices based solely on bacterial community profiles are similarly strongly correlated with impacts of salmon aquaculture, and those indices are sufficiently robust to be replicable across multiple independent laboratories [92], suggesting potential utility for routine biomonitoring. Of particular interest with respect to the current study is the fact that "de novo" methods for identifying bacterial indicators-i.e., taxonomyfree approaches based on a statistical analysis of ASVs-performed better in these contexts than a taxonomy-based approach in which 16S DNA sequence data was used to calculate a previously developed bacterial biotic index [93]. In addition, complementary analyses associated with our work also found that bacteria-nutrient relationships remained strong over time, except immediately following hydrologic disturbance, and that bacterial metrics best represent long-term nutrient concentrations, though they were still correlated with short-term concentrations as well [94]. In addition, comprehensive and complementary analyses of temporal variability among sites in our study found that bacteria-nutrient relationships remained strong over time, except immediately following hydrologic disturbance, and that bacterial metrics best represent long-term nutrient concentrations, though they also were still correlated with short-term concentrations [94]). These additional analyses also showed that indicators developed using multiple samples over time performed well despite temporal variability in both bacterial assemblages and nutrient concentrations.
Bacterial DNA metabarcoding has also proven useful to assess a variety of other anthropogenic environmental stressors. Recent studies demonstrate that indices based on bacterial community structure respond strongly to environmental impacts of oil and gas drilling and production [95], urban pollution in coastal mangrove forests [96], and metal pollution associated with cement plants [97]. Although many of these studies focus on benthic bacterial assemblages in estuarine and marine environments, similar approaches have been developed for freshwater habitats, including a recent study developing biological indicators of river health based on changes to bacterial communities in periphyton [98]. Our research thus adds to a growing literature illustrating the potential value of 16S metabarcoding for routine biomonitoring and provides further evidence of the utility of taxonomy-free approaches for developing informative indicators based on bacterial responses to environmental stress.
While our work highlights the potential utility of DNA metabarcoding applied to periphyton bacteria as nutrient indicators, these results suggest that other areas of metagenomic research would likely prove useful. Future studies could investigate not only the diversity of bacterial taxa present but also the genetic capabilities of those taxa related to processing nutrients. Studies of RNA from periphyton bacteria coupled with nutrient chemistry would likely show which nutrient processing genes are activated in response to nitrogen or phosphorus availability [99,100] (Graves et al. 2016;Dai et al. 2019) while also providing data on which bacterial taxa are functioning in these conditions [101,102] (Leff et al. 2015;Li et al. 2018). Analysis of these bacterial biochemical pathways could also be informative to how these bacteria interact with other members of the periphyton community. To date, microbial metagenomic nutrient processing studies have not been coupled with nutrient indicator analyses, but such work would be a valuable addition to environmental management tools applied to nutrient pollution. Pilgrim et al. Page 12 Water (Basel). Author manuscript; available in PMC 2023 July 30.

Conclusions
Periphyton and other algal-based communities continue to be high-priority organisms collected as part of monitoring programs for their use as indicators of nutrient pollution in aquatic ecosystems [17,103]. Our analyses demonstrated that DNA metabarcoding applied to periphyton-associated bacteria provides similar results to DNA metabarcoding applied to benthic diatoms for quantifying nutrient responses in a watershed affected by agricultural and urbanization stresses. Multiple statistical methods identified assemblage changes along TP and TN gradients that could serve as possible targets for decision-makers tasked with mitigating nutrient inputs. While the results indicated that potential TP concentration targets based on bacterial indicators were higher than those found for diatom indicators in the same watershed, these responses still provide useful ecological insights into how nutrient pollution affects aquatic ecosystems. These targets may be informative for the development of nutrient criteria related to land use, best management practices for reducing nutrients in watersheds, and discharge permits. Further development of technologies, methods, databases, and analytical methods will continue to improve the capabilities of applying genetic/genomic approaches to biological monitoring.

Supplementary Material
Refer to Web version on PubMed Central for supplementary material.  Nonmetric multidimensional scaling ordination showing Spearman correlations of axis scores with total phosphorus (TP), total nitrogen (TN), watershed percent forest, watershed percent agriculture, and relative abundances of low P, low N, high P, and high N bacterial ASVs. Vectors are scaled to span the range of possible correlation coefficients (−1 to 1) along NMDS axes.   in the cumulative frequency curves, and multiple ASV change points occurring within a narrow range of TP or TN concentrations suggest nutrient thresholds at these concentrations. Broad peaks in sumZ scores, gradual increases in cumulative frequency curves, and gradual addition of ASV change points indicate more gradual responses to nutrient concentration and a longer gradient of community change.      regression tree analysis, GF = gradient forest analysis). GF2 has no horizontal bars because it only briefly exceeded the density-of-splits to density-of-data ratio of 1.  Phylogenetic tree of the 429 ASVs comprising 75% of the total relative abundance among all samples. Indicator ASVs from TITAN are marked. Major bacterial taxonomic groups are color coded in the legend in the order they appear in the tree, clockwise from the cyanobacteria near the top. Groups with only 1 or 2 members are unmarked.     Results from boosted regression tree models for each bacterial metric showing deviance explained as a percentage of the null deviance for observed data and of deviance explained using cross-validated (CV) data. EC = electrical conductivity. Nutrient indicator ASVs by bacterial phylum/order. Water (Basel). Author manuscript; available in PMC 2023 July 30.