Metagenomic Profiling of Microbial Pathogens in the Little Bighorn River, Montana

The Little Bighorn River is the primary source of water for water treatment plants serving the local Crow Agency population, and has special significance in the spiritual and ceremonial life of the Crow tribe. Unfortunately, the watershed suffers from impaired water quality, with high counts of fecal coliform bacteria routinely measured during run-off events. A metagenomic analysis was carried out to identify potential pathogens in the river water. The Oxford Nanopore MinION platform was used to sequence DNA in near real time to identify both uncultured and a coliform-enriched culture of microbes collected from a popular summer swimming area of the Little Bighorn River. Sequences were analyzed using CosmosID bioinformatics and, in agreement with previous studies, enterohemorrhagic and enteropathogenic Escherichia coli and other E. coli pathotypes were identified. Noteworthy was detection and identification of enteroaggregative E. coli O104:H4 and Vibrio cholerae serotype O1 El Tor, however, cholera toxin genes were not identified. Other pathogenic microbes, as well as virulence genes and antimicrobial resistance markers, were also identified and characterized by metagenomic analyses. It is concluded that metagenomics provides a useful and potentially routine tool for identifying in an in-depth manner microbial contamination of waterways and, thereby, protecting public health.


Introduction
Water is essential for human life and productivity, yet both water quality and security are increasingly under threat globally [1][2][3]. Unfortunately, while wealthy countries are able to afford effective water treatment, poorer nations are severely hampered by a lack of resources for safe water to protect public health. In both cases, sources of contaminants entering waterways are not sufficiently addressed [2].
Communities within the Crow Nation in south central Montana have been aware of deteriorating water quality of the Little Bighorn River for many years [4]. Concerned members of the Crow Nation

Materials and Methods
Water samples were collected from the Crow Fair swim hole of the Little Bighorn River, Crow Agency, Montana, on July 16,2017. The swim hole is located at latitude/longitude of 45 o 36 1"N, 107 o 27 12"W, and is a popular summer recreational site used by children and adults of the population of ca. 1,600 residents of Crow Agency. During sampling, several children were observed swimming 100 meters upstream of the sampling site. Four samples were collected at ten-minute intervals over the course of 30 minutes, then pooled. Samples were transported on ice to Montana State University for processing by two different methods. First, 100 mL aliquots from each of the four consecutive samplings were pooled (400 mL total) and filtered using 47 mm, 0.45 µm filters to collect particulates. Filters were processed using the PowerWater DNA isolation kit (Qiagen). (Technical notes: The PowerWater kit was chosen in part due to its incorporation of reagents to remove inhibitors that may interfere with downstream PCR and DNA sequencing reactions. Choice of 0.45 µm filters as opposed to 0.22 µm filters was due to the turbidity of the water samples. We followed the manufacturer's recommendation to use 0.45 µm filters for turbid water samples to reduce clogging and allow a greater volume of water to be filtered than would be possible with the more restrictive 0.22 µm filters. We readily acknowledge that this choice may have reduced the variety of bacteria detected, since during the initial stages of filtering, smaller bacteria would be lost in the larger 0.45 µm pores, but also are aware that as the filters clogged, many smaller cells should have been captured.) Modifications to the PowerWater kit protocol were made as follows: Filters were placed in a 5 mL PowerWater kit tube, and 1.5 mL of PowerWater kit buffer (instead of 1 mL per manufacturer's instruction) and Metapolyzyme (20 µL of a 10.0 mg mL −1 sterile PBS pH7.5, Sigma #MAC4L) were included in the lysis step to enhance digestion of extracellular material and release of DNA. (Note: In our experience, use of 1.5 mL of the first PowerWater kit buffer was found to increase the yield of DNA compared to using only 1 mL.) The tube was vortexed for ca. 60 s (minimizing fragmentation) and incubated overnight at 37 • C, with periodic rotation and agitation. After overnight incubation, DNA extraction was continued, following PowerWater kit manufacturer's instructions.
A separate DNA extraction procedure was carried out to harvest DNA for the detection of coliform bacteria. To begin, three technical replicates of ca. 50 mL river water were filtered under vacuum and the filters placed on m-Coliblue24 plates [30] and incubated overnight at 37 • C for coliform counts. Following the manufacturer's protocol, membrane filters of 0.45 µm pore diameter were used [30] for the m-ColiBlue24 assay. It is noteworthy that consistency was maintained for both filtration procedures, in that 0.45 µm pore filters were also used for filtering the larger 400 mL volume of river water for the uncultured, non-selective procedure described above. After the resulting blue E. coli colonies on the m-Coliblue24 plates were enumerated, one filter yielding colony growth was selected for DNA extraction, using the PowerWater kit and the amended protocol described above. A second filter with colony growth was lifted with forceps, replica plated on CHROMagarO157 agar (CHROMagar), and incubated overnight at 37 • C. The appearance of mauve-colored colonies indicated putative EHEC.
Because of the heightened degree of public health concern regarding E. coli serotypes O157:H7 and O104:H4, and V. cholerae O1 El tor, coverage plots and completion estimates were also generated as an additional indication of confidence in identifying these potential pathogens. GraphMap [32] was used for mapping and coverage calculations for these sequences.
DNA preparations from both river water (without selective growth) and m-ColiBlue24 selection samples were examined by PCR for the presence of eae and Stx genes that are indicative of EHEC and 4 of 18 EPEC as previously described [17,33]. Presence of eae is characteristic of both EHEC and EPEC; on the other hand, EHEC contains Stx genes while EPEC lacks Stx genes.

Metagenomic Sequences Generated from DNA Prepared Directly from River Water without Selective Growth on m-ColiBLue243
Environmental DNA from microorganisms collected by filtration of water from the Little Big Horn River was subjected to shotgun sequencing using the Oxford Nanopore MinION platform. Sequencing generated 397,884 reads comprising~1.1 Gbp with an average read length of 2760 bp. CosmosID analysis of the DNA sequences indicated the presence of both Eukarya and Bacteria in the river water community. These included: Eukaryotic protists (Table 1), fungi (Table 2), and bacteria (Table 3 and Figure 1). Several eukaryotic genera that are of potential concern to human health, including Acanthamoeba, Leishmania, Candida, and Rhizomucor, were identified in the analyses (Tables 1 and 2). Bacteria of concern to human health, Acidovorax and Aeromonas salmonicida, were also identified ( Table 3). Limnohabitans was the dominant genera in the filtered river biomass, followed by Actinobacterium, a genus that includes important members of a healthy gut microbiome.

Metagenomic Analysis of DNA Prepared from Filter after Selective Growth on m-ColiBlue24 Medium
The average concentration of E. coli was 66 colony forming units (CFU) per 100 mL water that was detected on filters incubated overnight on m-ColiBlue24 medium. This concentration is well below the limit of 126 CFU per 100 mL established by the EPA [43] for recreational water to be considered safe for swimming.
DNA was prepared and sequenced from colonies grown overnight on the filters with an m-ColiBlue24 selection. A total of~1.6 Gbp of data was generated, comprised of 1,261,165 sequence reads with an average length of 1260 bp. Several bacterial species, some of which are important human pathogens, were detected that had not been identified in the native river water metagenome (Table 4 and Figure 2). In addition to numerous opportunistic pathogens, strains of toxigenic E. coli, Shigella spp., and Vibrio cholerae were also detected. A number of bacteriophages (Table 5), antimicrobial resistance (AMR) gene markers (Table 6), and virulence genes (Table 7) were identified from the metagenomic analysis after m-ColiBlue24 selection. Several of these bacteriophages, AMR markers, and virulence genes are relevant to human health.     The eae and Stx genes were both undetected in DNA prepared from unenriched river water, whereas these genes were both detected in DNA isolated from a filter cultured on selective m-ColiBlue24 medium (data from PCR not shown). The presence of Stx2 converting phage sequences was also indicated by the metagenomics analysis (Table 5). Colonies grown on m-ColiBlue24 media and replica plated onto CHROMagarO157 media gave rise to scattered, small spots of mauve growth, indicating the presence of EHEC bacteria.
Five markers of antimicrobial resistance (AMR) at 18 different gene loci were identified in the metagenomic analysis of the m-ColiBlue24 selection sample (Table 6). These markers are related to efflux of antibiotics, resistance to the beta-lactam class of antibiotics, as well as resistance to ampicillin, fluoroquinolones, and polymyxins.
Several virulence genes that contribute to the ability of microbes to cause disease were identified (Table 7). These genes code for virulence factors related to attachment, acid resistance, enhanced serum survival, the competitive advantage against other microbes, and iron acquisition capability.
Calculations of DNA sequencing coverage and depth of coverage were made by mapping reads to the genomes of three pathogens of major public health significance. Reads were mapped to E. coli O104:H4, E. coli O157:H7 Sakai, and V. cholerae O1 El Tor with 96%, 95% and 93% coverage (completion) of genomes, respectively (Table 8 and Figures 3-5). The depth of coverage was 52×, 50×, and 36×, respectively, for the three genomes (Table 8). Based on genome reporting standards proposed by Bowers et al. [69], these genomic coverages would meet the criterion for high quality metagenome-assembled genomes for these three species. Given that the sequences were mapped to reference genomes with high fidelity, there are unlikely to be multiple, heterogeneous populations for each species. Consequently, these pathogenic populations were present in the river water, and were detectable after selection and enrichment on m-ColiBlue24 media. Several virulence genes that contribute to the ability of microbes to cause disease were identified (Table 7). These genes code for virulence factors related to attachment, acid resistance, enhanced serum survival, the competitive advantage against other microbes, and iron acquisition capability.
Calculations of DNA sequencing coverage and depth of coverage were made by mapping reads to the genomes of three pathogens of major public health significance. Reads were mapped to E. coli O104:H4, E. coli O157:H7 Sakai, and V. cholerae O1 El Tor with 96%, 95% and 93% coverage (completion) of genomes, respectively (Table 8 and Figures 3-5). The depth of coverage was 52×, 50×, and 36×, respectively, for the three genomes (Table 8). Based on genome reporting standards proposed by Bowers et al. [69], these genomic coverages would meet the criterion for high quality metagenome-assembled genomes for these three species. Given that the sequences were mapped to reference genomes with high fidelity, there are unlikely to be multiple, heterogeneous populations for each species. Consequently, these pathogenic populations were present in the river water, and were detectable after selection and enrichment on m-ColiBlue24 media.

Discussion
This study describes a metagenomic analysis of water samples collected from a popular swimming site along the Little Big Horn River during the summer of 2017. This work was predicated on previous detection and identification of EHEC and EPEC bacteria in water samples collected from the Little Bighorn River [17] and ongoing concerns of the local community related to water quality and safety. Initial metagenomic analysis of total DNA isolated from filtered river water indicated the presence of species and strains of typical freshwater microorganisms, including both culturable and non-culturable microorganisms. Distinguishing between culturable and non-culturable microbial strains is important, since a study of freshwater lake bacteria estimated approximately only 0.25% of the total bacterial population was culturable [70]. Indeed, most bacterial populations in these

Discussion
This study describes a metagenomic analysis of water samples collected from a popular swimming site along the Little Big Horn River during the summer of 2017. This work was predicated on previous detection and identification of EHEC and EPEC bacteria in water samples collected from the Little Bighorn River [17] and ongoing concerns of the local community related to water quality and safety. Initial metagenomic analysis of total DNA isolated from filtered river water indicated the presence of species and strains of typical freshwater microorganisms, including both culturable and non-culturable microorganisms. Distinguishing between culturable and non-culturable microbial strains is important, since a study of freshwater lake bacteria estimated approximately only 0.25% of the total bacterial population was culturable [70]. Indeed, most bacterial populations in these

Discussion
This study describes a metagenomic analysis of water samples collected from a popular swimming site along the Little Big Horn River during the summer of 2017. This work was predicated on previous detection and identification of EHEC and EPEC bacteria in water samples collected from the Little Bighorn River [17] and ongoing concerns of the local community related to water quality and safety. Initial metagenomic analysis of total DNA isolated from filtered river water indicated the presence of species and strains of typical freshwater microorganisms, including both culturable and non-culturable microorganisms. Distinguishing between culturable and non-culturable microbial strains is important, since a study of freshwater lake bacteria estimated approximately only 0.25% of the total bacterial population was culturable [70]. Indeed, most bacterial populations in these environments are viable but not culturable (VBNC) using standard bacteriological culture methods [71][72][73]. A variety of both naturally occurring and potentially pathogenic bacterial species have been shown to enter the VBNC state in response to environmental stress, reducing detection of a significant percentage of a population with relevance to public health in environmental surveillance.
A second metagenomic analysis was also performed using DNA prepared from a filtered water sample after incubation on m-ColiBlue24 medium overnight to allow for selection of coliforms and related species. This two-pronged approach was taken to enhance detection and identification of pathogens, the growth of which may be inhibited by other river bacteria, and therefore not previously recognized in earlier studies targeting detection of coliforms in the river water [17].
Metagenomic analysis of DNA extracted from filters without growth on selective medium revealed a rich diversity of microorganisms, the predominant species of which are presented in Tables 1-3. The absence of E. coli in DNA prepared without enrichment on a selective medium was not surprising given the relatively small number of reads and because enrichment on selective media yielded only 66 CFU/100 mL of E. coli in overnight culture on m-ColiBlue24 medium. This medium has been approved by the EPA [30,74] as a sensitive method for detecting and monitoring fecal coliform (E. coli) bacteria in fresh water, where a count of 126 CFU/100 mL (calculated as geometric mean for samples collected over a 30-day period) for E. coli is the maximum permissible limit for recreational waters [43]. Lack of detection of E. coli by metagenomic analysis without selective growth is attributed to the overwhelming abundance and diversity of non-E. coli microorganisms that were present. The proportion of E. coli present in the water samples was representatively small in comparison to the high microbial load on selective media, evidenced by results of the analysis of the m-ColiBlue24-derived metagenome, revealing Gammaproteobacteria and coliform bacteria in significant abundance. Our choice of 0.45 µm pore diameter membrane filters was based on following the manufacture's protocol [30] for the EPA-approved m-ColiBlue24 method, as well as water sample turbidity. We acknowledge that use of this pore size instead of a smaller pore diameter filter could have resulted in our missing smaller sized microorganisms of public health significance. However, species and strains ( Table 4) that were identified, including many serotypes of diarrheagenic bacteria, such as EHEC O157:H7, were also identified in an earlier study [17].
DNA sequences indicative of E. coli serotype O104:H4 and V. cholerae O1 El Tor, both human pathogens of significant interest, were identified (Table 4). Of particular concern, E. coli O104:H4 is an emerging pathogen that first received widespread attention in 2011 as the causative agent of the largest outbreak of Shiga toxin-related disease [75] recorded to date [50]. In Germany and surrounding areas, an O104:H4 outbreak strain caused 3,842 cases of illness, including 18 deaths. Among those stricken, 855 people developed hemolytic uremic syndrome (HUS), leading to an additional 35 deaths [50]. The disease-associated O104:H4 outbreak strain is a novel variant of enteroaggregative E. coli (EAEC) that acquired the Shiga toxin gene that is characteristic of EHEC.
Detection of V. cholerae sequences in the Little Bighorn River is not surprising. V. cholerae, the causative agent of cholera, is an aquatic bacterium with world-wide distribution [76], that may be due to globalization and may indicate changing human demographics. Recently, V. cholerae caused an outbreak of disease in Haiti that had not been seen in 100 years [77,78]. Several virulence genes have been reported as essential for these bacteria to cause an outbreak of cholera, especially including the ctxA and ctxB genes encoding cholera toxin and carried by the bacteriophage CTXϕ. This bacteriophage was not detected in this study (Table 5). However, a cluster of genes (VCA0107, VCA0109, VCA0111, VCA0121, vgrG-3, and vasH; see Table 7) associated with the type VI secretion system (T6SS), an important virulence factor of many Gram-negative pathogenic bacteria, including V. cholerae [79], were detected. In the related species V. proteolyticus, the T6SS includes cytotoxic effectors that target both prokaryotic and eukaryotic cells [80]. In V. cholerae, the T6SS has been shown to kill other bacterial species, releasing DNA that in turn can be taken up in the process of horizontal gene transfer (HGT) by naturally competent Vibrio bacteria [67]. Genes taken up by HGT may enhance the antibiotic resistance and virulence potential of Vibrio cells, highlighting the evolutionary potential of pathogenic bacteria in natural environments to become more virulent.
Of relevance to human health, bacteriophage-encoded genes that enhance the pathogenicity of host bacteria were also detected. Two types of Shigella-specific phage, SfII and SfIV, allow for O antigen modification and increased antigen variation [59,60]. The Stx2 converting phage of E. coli O157:H7 and other related Shiga toxigenic E. coli (STEC) encodes the Stx2 protein, an important virulence factor causing lysis of host cells and contributing to hemolytic uremic syndrome [61].
Detection of several AMR markers in the m-ColiBlue24 metagenome ( Table 6) is relevant as the worldwide spread of antibiotic resistance is increasingly recognized as a major public health threat, compromising treatment of a variety of infectious diseases [81]. Widespread use of antibiotics in human and veterinary medicine has contributed to an increasing pool of bacteria harboring AMR genes and these bacteria, in turn, are now widely distributed in agricultural products, animals, humans, and the environment [82].
The metagenomic analyses presented in this study indicate that a variety of potential human disease-related pathogens and AMR markers were present and detectable in water samples collected from the Little Bighorn River during the summer of 2017. The presence of gene markers for E. coli O157:H7 (Tables 4 and 5), a human pathogen of significant concern, is in agreement with earlier findings of Hamner [17]. Presence of Shiga toxin gene markers indicated by both PCR (data not shown) and metagenomic analysis, as well as mauve-colored colony growth on ChromagarO157 medium, a differential/selective medium and indicative test for O157:H7, provide both genetic and phenotypic evidence for continued presence of E. coli O157:H7 bacteria in the Little Bighorn River. As it is understood that the major reservoir of O157:H7 bacteria is cattle and other ruminants [83], livestock ranching operations along the length of the Little Bighorn River, including a large concentrated animal feed operation close to the headwaters of the river, provide likely sources of this contamination to the watershed.
Penicillin derivatives are widely used in animal husbandry and hence ampicillin and beta-lactamase resistance might be expected to coincide with the presence of animal-associated pathogens [84]. However, tetracyclines tend to be more broadly used, and the absence of any tetracycline resistance gene markers would suggest further work is needed to identify sources of contamination. It is not currently known which antibiotics are primarily used in the Little Bighorn watershed.
Animal experiments with the E. coli O157:H7 bacteria or other potential pathogens identified in the present study were not conducted. Therefore, it is unclear whether isolates from the Little Bighorn River are capable of causing disease. Nevertheless, the presence of E. coli O157:H7 bacteria detected in the river consistently and over several years, along with identification of other known pathogens, is of concern. Consequently, the potential for horizontal gene transfer based on detection of AMR genes and evolution of pathogens with enhanced pathogenic potential and spread of AMR cannot be ignored [85].
The metagenomics analyses carried out in this study yielded results that strongly suggest further metagenomic analysis should be conducted, using both longitudinal and seasonal study designs to provide statistically significant data to inform public health efforts.
The Crow Environmental Steering Committee has endorsed continued study, with a focus on both the Crow Fair swim hole site of the present study and upstream sites to determine the extent and potential sources of microbial contamination. The staff of the Crow Water Quality Project continue to educate the community on water quality and environmental health issues. Since the local tribal college is a two-year institution with limited facilities and resources, our use of the portable and relatively affordable MinION sequencing platform may serve as a proof of concept for introducing students at smaller tribal colleges to DNA sequencing technology as a means of monitoring water quality. Use of the MinION system may be applicable to the study of genomics in a teaching and research setting where the cost of other more expensive sequencing technologies is prohibitive.

Conclusions
Waterborne disease continues to threaten human health worldwide. Many regulatory agencies employ coliform testing of water as an indication of the extent of fecal contamination and disease risk. Even when the concentration of coliform bacteria is within an acceptable level, this method does not identify specific microbes that may be pathogenic at a very low dose-of-infectivity. In this study, we test the feasibility of using a highly portable DNA sequencing device, that may in the future be readily deployed for routine monitoring of water quality outside of research laboratory settings, for detection and metagenomic analysis of waterborne disease pathogens present in a river affected by fecal contamination from cattle ranching and leaking sewage systems. We demonstrate that even at an "acceptable" level of fecal coliform bacteria deemed to be safe for human recreational use of a river, seemingly rare and unexpected (for rural Montana) pathogens, such as E. coli O104:H4 and V. cholerae, as well as pathogens with a low dose-of-infectivity on the order of 1-10 cells, e.g., E. coli O157:H7, can be detected using metagenomic analysis.
As portable DNA sequencing devices continue to be refined and made more affordable, and as metagenomics software and analysis are fully integrated with these sequencing platforms, it can be envisioned that real time surveillance for water borne pathogens, virulence genes, and AMR gene markers will be incorporated into environmental monitoring to protect human health. The present study serves as a proof of concept of the utility of such an approach, by demonstrating the ability to detect not only pathogenic microorganisms, but also virulence and AMR genes. Use of traditional methods to screen for pathogens and phenotypic traits requires a targeted approach to test for specific agents and genes, and may require weeks or months to complete. Integrated DNA sequencing and metagenomic analysis, on the other hand, can be performed in real time, requiring only hours or days to complete an assessment for waterborne pathogens. Funding: This research was funded in part by Uniting for Health Innovation, an "independent, nonprofit organization that unites government, industry, and local communities in the Americas to advance innovation in public health." Partial support for Steve Hamner's efforts was provided by Jane A. Dubitzky.