Exploring the Remarkable Diversity of Culturable Escherichia coli Phages in the Danish Wastewater Environment

Phages drive bacterial diversity, profoundly influencing microbial communities, from microbiomes to the drivers of global biogeochemical cycling. Aiming to broaden our understanding of Escherichia coli (MG1655, K-12) phages, we screened 188 Danish wastewater samples and isolated 136 phages. Ninety-two of these have genomic sequences with less than 95% similarity to known phages, while most map to existing genera several represent novel lineages. The isolated phages are highly diverse, estimated to represent roughly one-third of the true diversity of culturable virulent dsDNA Escherichia phages in Danish wastewater, yet almost half (40%) are not represented in metagenomic databases, emphasising the importance of isolating phages to uncover diversity. Seven viral families, Myoviridae, Siphoviridae, Podoviridae, Drexlerviridae, Chaseviridae, Autographviridae, and Microviridae, are represented in the dataset. Their genomes vary drastically in length from 5.3 kb to 170.8 kb, with a guanine and cytosine (GC) content ranging from 35.3% to 60.0%. Hence, even for a model host bacterium, substantial diversity remains to be uncovered. These results expand and underline the range of coliphage diversity and demonstrate how far we are from fully disclosing phage diversity and ecology.


Introduction
Phages are important ecological contributors, renewing organic matter supplies in nutrient cycles and driving bacterial diversity by enabling co-existence of competing bacteria by "Killing the winner" and by serving as genomic reservoirs and transport units [1,2]. Phage genomes are known to contain auxiliary metabolism genes (AMGs), toxins, and virulence factors [3][4][5][6][7]. Through lysogeny and transduction, they can transfer metabolic traits including antibiotic resistance to their hosts and can confer immunity against homologous phages [1].
Still, despite their ecological role, potential as antimicrobials and the fact that they carry a multitude of unknown genes with great potential for biotechnological applications, phages are vastly understudied. Around 10,000 phage genomes have now been published [8]. Though the number increases rapidly, we may have merely scratched the surface of the expected diversity. It is estimated that at least one billion bacterial species exist [9]. Hence, only phages targeting a tiny fraction of potential hosts have been reported. Efforts to disclose the range and diversity of phages targeting a single host have revealed a stunning display of diversity. The most scrutinized phage host is the Mycobacterium smegmatis, for which the Science Education Alliance Phage Hunters program has isolated more than

Materials and Methods
The screening for coliphages was performed with the HiTS method as described in [25], though instead of direct plaque sequencing [26], lysates of wells giving rise to plaques were sequenced. In short, an overnight enrichment (37 • C) was performed in microplates with E. coli, media, and wastewater (0.5 mL/well); the next day, the enrichments were filtrated (0.45 µm), re-inoculated (∼1 µL), and re-incubated overnight (37 • C); on the third day, a second filtration (0.45 µm) and a spot-test (soft-agar overlay) were performed to indicate positive wells.

Bioinformatics
Nucleotide (NT) and amino acid (AA) similarities were calculated using tools recommended by the ICTV [21], i.e., BLAST [30] for identification of the closest relative (BLASTn when possible, discontinuous megaBLAST (word size 16) for larger genomes), and Gegenees version 2.2.1 [39] for assessing phylogenetic NT (BLASTn) and AA (tBLASTx) distances of multiple genomes, with fragment size 200 bp and step size 100 bp. Intergenomic nucleotide sequence similarity and aligned genome fractions between all isolated phage species were plotted with VIRIDIC [40]. NT similarity was determined as percentage query cover multiplied by percentage identity. Novel phages were categorised according to ICTV taxonomy. The criterion of 95% DNA sequence similarity for demarcation of species was applied to identify novel species representatives and to determine uniqueness within the dataset. Evolutionary analyses for phylogenomic trees were conducted in MEGA7 version 2.1 (default settings) [41]. These were based on the large terminase subunit (Caudovirales), a gene commonly applied for phylogenetic analysis [42,43] and on the DNA replication gene (gpA) (Microviridae). The NT sequences were aligned by MUSCLE [44] and the evolutionary history inferred by the Maximum Likelihood method based on the Tamura-Nei model [45]. The trees with the highest log-likelihood and are shown. Pairwise whole genome comparisons were performed with Easyfig 2.2.2 [46] (BLASTn), curated by adding color-codes and identifiers in Inkscape version 0.92.2. The R package iNEXT [47,48] in R studio version 1.1.456 [49] was used for rarefaction, species diversity (q = 0, datatype: incidence_raw), extrapolation thereof (estimadeD), and estimation of sample coverage. The visualisation of genome sizes and GC contents was prepared in Excel version 16.31. Blast+ 2.9.0 [50] was used to perform a NT search of the coliphages (queries) against a database with the IMG/VR v2.0 database sequences [51] and the human gut virome database (GVD) v 1.7 [52]. Reads from metagenomes and metaviromes were mapped using bbmap 38.22 [53]. Genome breadth and depth coverage was calculated using genomecov from BEDtools 2.28.0 [54] and BamM 1.7.3 [55], respectively.

Wastewater Coliphages Are Remarkably Diverse
The sequenced coliphages were analysed strictly in silico, focusing on their relatedness to known phages, taxonomy, and distinctive characteristics. The genome assemblies had a coverage of ×20-12122 with an average of ×390.5 (Table S1). The genome screening algorithms identified no homologs of known virulence or antibiotic resistance genes. Though not a definitive exclusion, this is interpreted as a reduced risk of presence, a preferable trait for phage therapy. The majority of genes identified when screening for AMGs code for phage DNA modification pathways (Table S2).
The isolation method (HiTS) favours easily culturable plaque-forming virulent phages [25]. Still, even though we screened wastewater samples, which is a commonly used source for isolation of coliphages, we identified 136 coliphages of which 92 differed by ≥5% from published phage genomes and some with nucleotide (NT) similarities as low as 29% (Table 1). Based on Blastn analyses and the 95% nucleotide similarity demarcation, 104 of the coliphages are unique phage species (Table 1, Figure 1). Based on DNA homology and phylogeny, the 104 unique coliphages group into 14 distinct clusters and 7 single phages ( Figure 1 and Figure S1). Coliphages were identified in samples from 44 of the 48 WWTPs (Table S1). There was no substantial difference in phage diversity distribution between samples of urban or rural origin ( Figure S2). Samples without coliphages likely reflect the crude nature of the screening method and in some cases sequence or assembly issues and not actual absence. From the majority of positive samples (n = 58) a single phage was sequenced, though some lysates held more than one phage (28 lysates: 2 phages, 6 lysates: 3 phages, 1 lysate: 4 phages, Table S1).
The 95% nucleotide identity demarcation of species is an arbitrary delimitation. It does not consider the biological importance of the non-identical sequence parts and imposes a discrepancy between the demarcation of species depending on genome size. However, it provides a means to quantify and compare relatedness enabling estimations of, e.g., culturable virulent coliphage species richness in the Danish wastewater environment ( Figure 2. An extrapolation of species richness (q = 0) predicts a total of 311 distinct species (requiring a sample size of ∼900 phages) ( Table S3). The relatively small sample size in this study (n = 136) may subject the estimation to a large prediction bias. The sampling-method also introduces bias by selecting for abundance, latency, and burst size, thereby potentially underestimating diversity. Sequencing and assembly methods as well as the choice of a host further reduce the number of detected phage genomes. Nonetheless, the results indicate the minimal diversity of culturable virulent dsDNA coliphages (MG1655, K-12) in Danish wastewater, estimated to be as a minimum in the range of 183 to 350 unique phage species ( Figure 2, Table S3).
The diversity of tailed dsDNA coliphages is well documented [8,22,23] and it is to be expected that a screening of nearly 200 wastewater samples would yield hitherto unknown phages. However, considering the use of only a single host strain and a crude isolation method ensuring that only a single or the few most successful phage(s) from each sample were sequenced, then the degree of novelty and diversity revealed is remarkable and verifies our hypothesis, as well as the efficiency of the HiTS method for exploring diverse phages of a single host [25].    Myoviridae, but also found the Podoviridae to be the least abundant coliphages in sewage [22]. However, these distributions likely reflect abundance distributions of culturable phages and not necessarily natural abundances.

Fifty-Five Myoviridae Species
The 55 Myoviridae phage species represent the greatest span in genome sizes, from the Suspvirus mistaenkt (86.7 kb) to the Dhakavirus dhaeg (170.8 kb) (Figure 2B), and all, except the Krischvirus, code for tRNAs ( Table 1). The Myoviridae group into eight distinct clusters and one single phage (mistaenkt), representing three subfamilies; Tevenvirinae, Vequintavirinae and Ounavirinae, in addition to the Phapecoctaviruses and a cluster of six unclassified Myoviridae (Figure 1, Table 1). The Tevenvirinae represent four genera, two krischviruses, five tequatrovirus, two dhakaviruses and two mosigviruses notable for their ability to perform arabinosylation of hmC [56].
The isolated Vequintavirinae are all vequintaviruses closely related (91.1-93.8%, BLAST) to classified species and were identified in samples from 12 of the 48 WWTPs. All of the Ounavirinae but the Suspvirus mistaenkt are felixounaviruses (89.7-93.9%, BLAST). The Felixounavirus is a relatively large genus with 16 recognized species isolated from Escherichia and Salmonella. In this study, felixounaviruses were identified 33 times in samples from no less than 23 WWTPs, indicating that they are ubiquitous in the Danish wastewater environment and that they are easily cultivated, though the method prevents assessment of relative preponderance. Felixounaviruses often have broad within-genus host ranges, and isolates have been shown to be able to rapidly expand their host range when challenged, co-coinciding with mutations in the long tail gene [57,58]. Five of the Myoviridae are members of the newly announced genus Phapecoctavirus with substantial similarity (86-90%, BLAST) to the type species Escherichia phage phAPEC8 (JX561091) [23,59]. The five phages in the last of the Myoviridae clusters are an even more homogeneous group than the phapecoctaviruses (Figure 3 and Figure S1). All five are closely related (92-95%, BLAST) to the same five unclassified Enterobacteriaceae phages vB_Ecom_PHB05 (MF805809), vB_vPM_PD06 (MH816848), ECGD1 (KU522583), phi92 (NC_023693), and vB_vPM_PD114 (MH675927) [60,61]; this group of nine phages is distinct (<44% NT similarity, BLAST) from all other described phages and thus represents a yet to be classified genus, presumably with the first sequenced phage phi92 as type species. Phi92 was isolated in 1982 and has been thoroughly characterised; it has a broad across-genus (Salmonella, Escherichia) host range enabled by multiple divergent tail fibres and can infect both non-capsulated and encapsulated hosts as it has a unique endosialidase tailspike encoded by gene 143 [60,61]. Interestingly, this gene appears to be unique for phi92, though other versions of a putative tailspike are present at the same position in the genomes of alia, PHB05, ECGD1, and the two PD06 and PD114 phages (Figure 3). Both the phapecoctaviruses and the unclassified Myoviridae genomes code for a complete dTDP-rhamnose biosynthesis pathway. The presence of a dTDP-rhamnose biosynthesis pathway in the DNA metabolism region of phage genomes is peculiar; one possible explanation is that these phages utilize rhamnose for glycosylation of hydroxy-methylated NTs in the same manner as the T4-generated glucosyl-hmC [56].

A New Addition to the Small Family Chaseviridae
The distinctive phage flopper only shares NT similarity (38.5-87%, BLAST) with ten other phages; it belongs to the newly established Carltongylesvirus (80.8-87% NT similarity, BLAST) of the new family Chaseviridae. This family currently has only nine species and the Carltongylesvirus only two species, Escherichia phage phiEcoM_GJ1 (EF460875) and Escherichia phage ST32 (MF044458). Both type species GJ1 and ST32 have broad within-genus host ranges [62,63]. NT similarity between flopper and GJ1 is partially low in the gene for the putative tail tape measure and also low between all three phage genomes in a tail fiber gene (Figure 4). The carltongylesviruses are unique in having characteristic Myoviridae morphology, i.e., icosahedral head, neck, and a contractile tail with tail fibres and also code for RNA polymerases, a feature otherwise characteristic to the T7-like phages of the Autographiviridae family [62,64,65].

Six Microviridae of Two Genera
The single phage Lilleven and the five gequatroviruses belong to the subfamily Bullavirinae, family Microviridae, order Petitvirales, characterised by ssDNA non-enveloped icosahedral phages (Table 1). Lilleven is a novel species of the genus Alphatrevirus, closely related to (93.9% NT similarity, BLAST, 89-90% AA similarity, Gegenees) the Alphatrevirus Enterobacteria phage St1 (NC_012868) ( Figure S3). The five gequatroviruses only differ from one another by single NT polymorphisms and in non-coding regions ( Figure S3). They cluster and share genomic organisation and extensive NT similarity (92.6-94.5%, BLAST) with the unclassified Microviridae Escherichia phage SECphi17 (LT960607), but only have 59.1-67.9% NT similarity (BLAST) with recognised Gequatrovirus species, with which they have almost no sequence similarity in the region coding for the major spike protein (gpG), a distinctive marker of the subfamily Bullavirinae involved in host attachment ( Figure S3, Table 1) [66]. However, considering the pronounced gene synteny between their relatively small genomes and a conserved AA similarity (62-64%, Gegenees), they are considered gequatroviruses.
The sequencing of the Microviridae is peculiar, as library preparation with the Nextera ® XT DNA kit applies transposons targeting dsDNA. However, during Microviridae infection, the host polymerase converts the viral ssDNA into an intermediate state of covalently closed dsDNA, which is then replicated in a rolling circle by viral replication proteins transcribed by the host RNA polymerase [67]. This intermediate state may have enabled the library preparation. The presence of host DNA (2.8-39.1% of reads) in the sequence results of these samples indicates an insufficient initial DNase I treatment (Table S4), which can be attributed to chemical inhibition or inactivation of the enzyme by adhesion to the sides of wells. Hence, it is reasonable to assume that the extracted microvirus DNA was captured as free dsDNA inside host cells during ongoing infections.

Twenty Drexlerviridae Phages Including a New Linage Representative
The 20 species of the new family Drexlerviridae represent a considerable expansion of the new subfamily Tempevirinae [68]. Eight of the Drexlerviridae belong to the new genus Warwickvirus (five species) with Escherichia virus swan01 as type species (LT841308), as they have ≥84.9% NT similarity (BLASTS) to recognised species thereof. The other eight are of the genus Hanrivervirus (NT: 86-90%, BLAST and AA: 77-85% Gegenees, Figure S4), currently consisting of only the type species Shigella virus pSf-1 (NC_021331) isolated from the Han River in Korea [69]. The warwickviruses and hanriverviruses isolated in this study all have comparable genome sizes, GC contents, and gene organisation with the respective type species ( Figure 5, Table 1). During their differentiation, many deletions and insertions of small hypothetical genes have occurred; most notable is a unique version of a putative tail-spike protein in seven of the new Hanrivervirus species and all of the new Warwickvirus species, indicating a variety of divergent host ranges ( Figure 5). All the hanriverviruses code for (putative) dam, and Psf-1 is resistant against at least six restriction endonucleases [69], suggesting that these phages employ DNA methylation as a defence strategy. The last Drexlerviridae is Jahat. The warwickviruses and hanriverviruses form a monophyletic clade together with Jahat ( Figure 1 and Figure S1). Even though Jahat has its own branch, this phage has gene synteny, slightly higher but comparable GC content, and shares an equal degree of NT similarity ≤68.7% with phages of both the Hanrivervirus and Warwickvirus ( Figure 5). Hence, Jahat cannot with confidence be assigned to either genus but falls in between, barely different enough to represent its own genus-an indicator of the genetic continuum of phages challenging taxonomic delimitations.

Eight Siphoviridae Species and a Novel Genus Representative
The eight Siphoviridae species vary greatly in GC content, ranging from 44.6% (Skure) to 54.6% (welsh), but are quite similar in genome sizes, 49.7-54.6 kb (Figure 2). Five of these phages are of the genus Dhillonvirus as they have substantial NT similarity (77-80%, BLAST) and pronounced gene synteny with the type species Escherichia virus HK578. As with the hanriverviruses and warwickviruses, their genomes only differ in minor hypothetical genes and have limited NT similarity in a gene of highly variable length coding for a tail fiber (gp26 in HK578) (Figure 6), a phenomenon also observed in the dhillonviruses isolated by Korf et al. (2019), which correspondingly had divergent host ranges [23]. Each of the three remaining Siphoviridae represents a different genus. Based on NT similarity and the presence of the canonical 7-deazaguanine operon, Skure is of the 13-species genus Seuratvirus, while buks is assigned to the two-species genus Jerseyvirus, subfamily Guernseyvirinae. Interestingly, the Siphoviridae Halfdan has only minuscule similarity with described phages (12-29%, BLAST). These entail two Pseudomonas phages vB_PaeS_SCUT-S3 (MK165657) and Ab26 (HG962376) [70], both Septimatreviruses, two Acinetobacter phages of the Lokivirus IMEAB3 (KF811200) and type species Acinetobacter virus Loki [71], and to a lesser degree the unclassified Achromobacter phage phiAxp-1 (KP313532) [72]. They have a common gene organization, yet their intra-Gegenees scores are low (≤1% BLASTn, <43% BLASTx, Figure S5), and NT similarity is negligent in roughly one-third of Halfdan's 57 CDSs (Figure 7). The TerL-based phylogeny and AA similarity also indicate a distant relation, although grouping Halfdan closer with the lokiviruses (40-43%, Gegenees BLATSx) than the septimatreviruses (33-34%, Gegenees BLATSx) (Figure 7, Figure S5). Clearly Halfdan is distinct from all other described phages and hence the first phage sequenced of a new Siphoviridae genus.

Nine Autographiviridae Species
The nine Autographiviridae all have the hallmarks of this new family, i.e., unidirectionally encoded genes and RNA polymerases [65,68,73]. They belong to the genus Bonnellvirus, as they have conserved gene organisation, a similar GC content, and also share considerable NT similarity (69-93%, BLAST) with the type species Enterobacteria phage J8-65 (NC_025445) ( Figure S6). The genomes of the nine new bonnellviruses and J8-65 are highly similar with differences primarily in small hypothetical genes, though Lidtsur codes for a unique version of tailspike colanidase ( Figure S6). Lidtsur was deposited to the NCBI GenBank before the others and is currently the only one which is an ICTV-approved species representative.

Four Podoviridae Species Including Two Novel Genus Representatives
The four Podoviridae all have high (>59%) GC contents and represent no less than three distinct genera ( Figure 2, Figure 8). Skarpretter is the type and only species of the genus Skarprettervirus [74]. Skarpretter is distinct from all described phages sharing only 38% NT similarity (BLAST) with the Giessenvirus Escherichia phage C130_2 (MH363708) isolated from cheese [75] (Figure 8 and Figure S7). Sortsne is the type species of the genus Sortsnevirus [74], currently consisting of only Sortsne and Klebsiella phage vB_KpnS_IME279 (MF614100); however, based on high NT similarity and conserved gene organization with IME279 (89.8%, BLAST), we suggest that sortkaff also belongs in Sortsnevirus (Figure 8 and Figure S7). The last Podoviridae sortsyn is of the new 2-species genus Murrayvius [76], as it shares a high degree of NT similarity and has conserved gene organization with the type species Enterobacteria phage IME_EC2 (KF591601) isolated from hospital sewage [77] (Figure 8).

The Wastewater Coliphages Are Largely Absent in Metaviromes
In order to investigate the prevalence of the 104 coliphage species in different environments we mapped the reads of 510 metagenomes from studies of primarily Danish wastewater, pig, and human gut samples (Table S5) [78,79]. The threshold for significant hits was set as mapped reads covering ≥70% of a coliphage genome, and the distribution of the mapped reads was assessed to verify that this threshold ensured identification of closely related phages ( Figure S8). No hits were found for any of the coliphages. This is likely a consequence of sequencing depth and sample preparation, as prior to sequencing, these metagenome samples were concentrated by centrifugation as a pellet or by CsCl gradient and the supernatant was either discarded or stored for future studies, and as a result, a large proportion of potential phage reads was omitted. Subsequently, we also searched for the coliphages in hundreds of metavirome datasets (Table S5) from Irish and Chinese faecal, human, animal, and water samples using the same read mapping method (Table S6). There were no hits to the human faecal contigs from Ireland [80], while 22 of the 104 coliphage genomes (21%) representing ten genera were covered by >70% by reads from 10 (mammals and birds) of the 38 (26%) Chinese Wang study libraries (Figure 9) [80]. For most phage genera, only reads from a single sample matched, though reads from five metaviromes (pet dog, pig, yak, and flamingo faeces) matched (>70% read coverage) with the Alphatrevirus Lilleven, and reads from seven metaviromes (dog, red panda, giant panda, non-human primate, masked civet, pig, and chicken faeces) matched (>70% read coverage) with the Carltongylesvirus flopper (Table S6). Finally, the genome sequences of the 104 coliphages (queries) were blasted against a database of 735,106 uncultured viral genomes (UVIGs) from the Integrated Microbial Genomes/Virus (IMG/VR) database, derived from a wide range of sample types including marine, freshwater, terrestrial, and hosts [51], as well as 13,203 UVIGs from human gut samples retrieved from the human gut virome (GVD) database (Table S7) [52]. The coliphage genomes were also blasted against the 8392 isolated virus genomes (iVGs) of the IMG/VR database and based on the observed alignment coverage distribution ( Figure S9), significant matches were defined as those covering >80% of coliphage genomes. With this threshold, there were significant matches for 23 of the 104 (22%) coliphage genomes to four of the 735,106 (0.0005%) IMG/VR UVIG sequences ( Figure 9, Table S7). Significant hits for finding close relatives of the phages isolated in this study in three databases. Hits are defined as hits ≥80% genome coverage when blasting the coliphage genomes against the IMG/VR and GVD databases, and as mapped reads covering ≥70% of individual coliphage genomes when mapping reads from the Wang study. The coliphages are grouped according to genera, and numbers in parentheses denote the number of coliphage species representing each genus; only genera with significant hits are shown. Color-codes of genera denote taxonomic family.
Only phages from 14 of the 24 taxonomic groups of coliphages from this study had matches in the virome databases assessed. For only 14 of the 62 coliphages with matches, a closely related phage could be identified in more than one virome. Although the coliphages are omnipresent and culturable in Danish wastewater, they are for a large part not represented in metagenomic data, and therefore these coliphage genomes provide valuable information. A lack of representation in metagenomic data could be caused by low natural abundance as this would result in insufficient sequencing depth for genome assembly within metagenomes/-viromes. The Siphoviridae Halfdan, the Myoviridae mistaenkt, and the phages of the Drexlerviridae genus Warwickvirus, the Autographviridae genus Bonnellvirus, all the Podoviridae genera, Sortsnevirus, Murrayvirus and Skarprettervirus, as well as the Microviridae genus Gequatrovirus did not match any UVIG sequences, nor did the reads from any virome cover ≥70% of their genomes. Surprisingly, the gequatroviruses were not detected in any of the gut-viromes, even though that apart from temperate and crAss-like phages, Microviridae dominates in human, mammal, and bird gut-microbiomes and that 860 Microviridae genomes were assembled from the assorted Wang et al. (2019) metaviromes [80,81]. However, due to the relatively small size of the Microviridae genomes, substitution of a single gene is enough to warrant a ∼5-33% difference in NT similarity, putting them below the set threshold for identification. The fact that both Halfdan and the bonnellviruses and all the Podoviridae represent novel genera with very few close relatives suggests that these lineages are under-sampled and not sufficiently abundant in the environments explored by metagenomic sequencing and assessed in this study. These findings underline the importance of isolating and sequencing individual phages in order to uncover diversity. It is plausible that phages selected for by plating techniques are not those that are naturally abundant; however, this cannot be concluded based on these results. Future studies should compare the diversity obtained by isolating to metagenomic sequencing of metaviromes of identical samples in order to establish the degree of discrepancy between these two methods.

Conclusions
By screening 188 wastewater samples, we identified 104 coliphages species (MG1655-K12), enabling us to predict the species richness of culturable virulent dsDNA coliphages in Danish wastewater, which is predicted to be at least 183-350 and expected to fluctuate drastically over time. The true species richness is likely even higher as the isolation, DNA extraction, library construction, and genome assembly method as well as the choice of a host all are liable to reduce the number of phages detected. Ninety-two of the newly isolated coliphages represent novel species of seven families; Myoviridae, Siphoviridae, Podoviridae, Drexlerviridae, Chaseviridae, and Microviridae. Though most of them distribute into 18 established genera, the diversity of these many phages isolated from a single strain is notable. They vary greatly in genome size and have a broad GC content range.
Apart from the analyses applied, the main difference between this and the comprehensive Korf et al., study from 2019 [23] is the isolation approach. Korf et al., isolated 50 phages from various sample types over several years from a wide collection of clinical E. coli isolates, whereas the wastewater sample collection and phage isolation in this study were performed in a matter of weeks on a single strain of E. coli. Still, the distribution of phage types including many of the same genera and the discovery of a handful of phages with limited similarity to known phages are in many aspects comparable, suggesting that the method of isolation (plaque purification) is perhaps the key limiting factor for uncovering the diversity of coliphages. However, fewer than 60% of the 104 coliphages are represented in the assessed metaviromes, emphasising the importance of cultivating phages to uncover the true diversity.
These findings add to our understanding of phage ecology and diversity, and through classification of these many phages we come yet another step closer to a more refined taxonomic understanding of phages. Furthermore, the numerous and diverse phages isolated in this study, all lytic to the same single strain, serve as an excellent opportunity to learn important phage-host interactions in future studies. These include, but are not limited to, lysogen-induced phage immunity, host-range, and anti-RE systems.
Finally, the first genus representative for at least three novel genera was sequenced in this study. Skarprettervirus and Sortsnevirus of the Podoviridae have been accepted by the ICTV. We propose that Halfdan is the type species of a new Siphoviridae genus, that the four novel Myoviridae species muut, alia, outra, and inny together with five unclassified Myoviridae also represent a new genus, and as the Drexlerviridae Jahat cannot with confidence be assigned to any described genera, Jahat may also represent its own lineage. In conclusion, this study shows that uncharted territory remains for even well-studied phage hosts and that cultivation approaches uncover vital genomes that seem absent from metagenomic studies.