Exploring the remarkable diversity of Escherichia coli Phages in the Danish Wastewater Environment, Including 91 Novel Phage Species

Phages drive bacterial diversity - profoundly influencing diverse microbial communities, from microbiomes to the drivers of global biogeochemical cycling. The vast genomic diversity of phages is gradually being uncovered as >8000 phage genomes have now been sequenced. Aiming to broaden our understanding of Escherichia coli (MG1655, K-12) phages, we screened 188 Danish wastewater samples (0.5 ml) and identified 136 phages of which 104 are unique phage species and 91 represent novel species, including several novel lineages. These phages are estimated to represent roughly a third of the true diversity of Escherichia phages in Danish wastewater. The novel phages are remarkably diverse and represent four different families Myoviridae, Siphoviridae, Podoviridae and Microviridae. They group into 14 distinct clusters and nine singletons without any substantial similarity to other phages in the dataset. Their genomes vary drastically in length from merely 5 342 bp to 170 817 kb, with an impressive span of GC contents ranging from 35.3% to 60.0%. Hence, even for a model host bacterium, in the go-to source for phages, substantial diversity remains to be uncovered. These results expand and underlines the range of Escherichia phage diversity and demonstrate how far we are from fully disclosing phage diversity and ecology.


138
The high-throughput screening method favours easily culturable plaque-forming lytic phages. Still, we identified 104 unique Escherichia phages of 139 which only 16% were ≥95% similar (BLAST) to already published phages (Table 2). Phages were identified in wastewater samples from 43 of the 48 140 investigated treatment facilities. From the majority of positive samples (58) a single phage was sequenced, however in some samples the lysate held 141 more than one phage. Twenty-five of the lysates held two phages, eight lysates held three phages and one had as many as four phages. Of the 104 142 unique phages, 91 represent novel species (Table 1, 2). Of these, 51 differed by ≥10% from published phage genomes and some have NT similarities as 143 low as 29% (Table 2).
These newly sequenced phages represent a substantial quota of divergent lytic Escherichia phages in Danish 145 wastewater, but are still far from disclosing the true diversity hereof (Figure 2). An extrapolation of species 146 richness (q = 0) predicts a total of 292 distinct species (requiring a sample size of ∼900 phages). The relatively 147 small sample-size in this study (n = 136), may subject the estimation to a large prediction bias. The sampling-148 method also introduces a bias by selecting for abundance and burst size, thereby potentially underestimating     Table   165 2). GC contents also vary greatly, from only 35.3% (Tequatrovirus teqhad) and up to 60.0% (the unclassified 166 sortsne) (Figure 3,

185
The genome screening algorithms identified no sequences coding for homologs of known virulence or 186 antibiotic resistance genes. Though not a definitive exclusion, this interprets as a reduced risk of presence, a 187 preferable trait for phage therapy application. Currently available tools for AMG screening of viromes did not 188 provide a comprehensive and exclusive assessment of the AMG pool in the dataset. The majority of genes 189 identified are not AMGs, but code for phage DNA modification pathways (Table S1). The function of some of   (Table 2) and code for NAMPT not present in cluster VI 229 (Table S1). As a group, cluster VII are even more homogeneous than cluster VI and all are closely related (92- verified morphology of ST32, phiEcoM_GJ1, PM1 and PP101, icosahedral head, neck and a contractile tail with 240 tail fibres, classifies them as myoviruses [59,60,62,63]. Based on NT similarity and genome synteny, these seven 241 phages belong to the same, not yet classified, peculiar lineage first described by Jamalludeen et al., (2008) [60].

258
The sequencing of the microviruses is peculiar, as library preparation with the Nextera ® XT DNA kit  In spite of similar genome sizes, the large group of Siphoviridae, is the most diverse (28 unique, 24 novel) 268 in this study, with GC contents ranging from 43.9-54-6% ( Figure 3, Table 2). The majority, clusters IV-V and

285
[22], their genomes mainly differ in minor hypothetical genes and in putative tail-tip proteins, indicating 286 divergent host ranges ( Figure S4). Based on NT similarity and the presence of the canonical 7-deazaguanine

299
Remarkably, halfdan has no NT similarity with known Escherichia phages, which could be an indication of E.  The nine novel Podoviridae (cluster XIV,no      Cluster XIV and J8-65 form a diverse monophyletic clade, with a substantial amount of deletions and 333 insertions between them, subdividing into three sub-clusters with intra-Gegenees scores ≤28% (Figure 5a, d).

334
Phage lidtsur is singled-out and also codes for a unique version of tailspike colanidase, smaasur resembles J8-335 65 phage and the rest group together ( Figure 5). Still, the three sub-clusters have for a large part conserved AA 336 sequences (66-72%, Gegenees) (Figure 5b, c, d). Interestingly, this also applies to Limezero, with whom they 337 have a Gegenees NT score ≤1%, but an AA similarity of 45-48%, supporting the phylogenetic grouping of

338
Limezero and cluster XIV (Figure 5a, b, c). Based on phylogeny, limited NT and low AA similarities it is evident 339 that there is a very distant relation between cluster XIV (and J8-65) and the phikmvviruses ( Figure 5)   vB_KpnS_IME279, IME_EC2 and lumpael form a monophyletic clade, the Gegenees score (5-15%) between 371 sub-clusters is surprisingly low (Figure 6a, b). Still, these six phages have comparable genome sizes (41.5-42.5 372 kb) and organisation, including similar relatively small-sized structural proteins, dam and dcm genes, equally 373 high GC contents (59.5±0.5%) and relatively high AA similarities (49-91%) (Figure 6a, c, Table 2). Hence, they 374 are clearly of the same lineage and likely to resemble IME_EC2 in having Podoviridae morphology. This group 375 is also clearly distinct from all other known phages (<5%, BLAST) and as such constitute a novel genus, with 376 a delimitation to be determined by future physical characterisation. Skarpretter and C130_2 form a