A Quest of Great Importance-Developing a Broad Spectrum Escherichia coli Phage Collection

Shigella ssp. and enterotoxigenic Escherichia coli are the most common etiological agents of diarrheal diseases in malnourished children under five years of age in developing countries. The ever-growing issue of antibiotic resistance and the potential negative impact of antibiotic use on infant commensal microbiota are significant challenges to current therapeutic approaches. Bacteriophages (or phages) represent an alternative treatment that can be used to treat specific bacterial infections. In the present study, we screened water samples from both environmental and industrial sources for phages capable of infecting E. coli laboratory strains within our collection. Nineteen phages were isolatedand tested for their ability to infect strains within the ECOR collection and E. coli O157:H7 Δstx. Furthermore, since coliphages have been reported to cross-infect certain Shigella spp., we also evaluated the ability of the nineteen phages to infect a representative Shigella sonnei strain from our collection. Based on having distinct (although overlapping in some cases) host ranges, ten phage isolates were selected for genome sequence and morphological characterization. Together, these ten selected phages were shown to infect most of the ECOR library, with 61 of the 72 strains infected by at least one phage from our collection. Genome analysis of the ten phages allowed classification into five previously described genetic subgroups plus one previously underrepresented subgroup.


Introduction
10% of global child mortalities are caused by diarrheal diseases during the first five years of life [1,2]. Most of these deaths occur in so-called developing countries, particularly in sub-Saharan Africa and southern Asia [2,3]. The etiology of diarrheal diseases depends on a number of factors including the age of the child and the geographical location [3]. However, two groups of bacterial pathogens, Shigella spp. and enterotoxigenic Escherichia coli (ETEC) are predominantly associated with moderate-to-severe diarrhea in children irrespective of age or location [3]. There are only a small number of antimicrobials and particular measures, mainly pertaining to personal hygiene and food and water sanitization, that may prevent the spread of such bacterial infections and decrease the severity of the associated illness [4]. However, many Shigella and E. coli strains have become resistant to a variety of widely applied antimicrobials [5][6][7]. As it is more challenging to prevent infections in developing

Bacterial Strains and Growth Conditions
Six strains of prophage-free Escherichia coli were used to isolate and propagate phages directly from environmental samples: BL21 [24], K12 [25], EC101, DH5α, XL1 Blue and Top10 (Thermo Fisher Scientific). ECOR, a collection of 72 E. coli strains [26], E. coli O157:H7 ∆stx [27,28] and Shigella sonnei 53G [29] were then used to further assess the host range of isolated bacteriophages. S. sonnei 53G strain was grown in Brain Heart Infusion (BHI) broth (Oxoid, U.K.) at 37 • C without aeration. The ECOR collection of strains encompasses 72 reference strains of E. coli that is predicted to represent the phenotypic as well as genotypic diversity of the species [26]. It comprises strains of human, animal and plant origin and from a wide geographical spread (For details of the collection, see [26]. E. coli strains were grown in Luria-Bertani (LB) broth (1%NaCl, 1% tryptone and 0.5% yeast extract). E. coli strains were grown at 37 • C with aeration. All bacterial strains were preserved as glycerol stocks at −80 • C.

Phage Isolation from Environmental Samples
Water samples were obtained from both natural (over thirty samples collected from springs and rivers across counties of the Southern, Northern and Western provinces of Ireland, including for instance rivers and lakes in county Cork, Glencar Waterfall streams, Connemara National Park streams and puddles) and industrial (more than ten sewage water samples from manufacturing plants located in Belgium and in Ireland) sources. All water samples were filtered using 0.45 µm membrane filters in order to remove bacterial contamination. Chicken meat pieces were placed in a stomacher bag containing a minimal volume of SMG buffer (200 mM NaCl, 10 mM MgSO 4 (Sigma Aldrich, Wicklow, Ireland), 50 mM Tris-HCl (Sigma Aldrich, Wicklow, Ireland), pH 7.5, 0.01% gelatin (Sigma Aldrich) and homogenized using a stomacher. The samples were centrifuged at 5000× g for 10 min and the resulting supernatant was then filtered using 0.45 µm membrane. The presence or absence of bacteriophages was determined using the double agar layer method [30]. 100 µL of fresh prophage-free E. coli overnight culture was added to 4 mL of soft LB agar (0.6% agar) and overlaid on an LB agar plate. 150 µL and 300 µL of filtered water samples were pipetted on top of the bacterial layer. The plates were examined for the presence of plaques after overnight incubation at 37 • C. If necessary, samples were enriched in 1:1 ratio in double strength LB broth containing 1% (v/v) overnight culture of the appropriate prophage-free strain. The samples were propagated overnight and plated using the double agar layer method as described above. Single plaques were propagated on an actively growing culture inoculated with 1% culture and plaque-purified twice on the same host. Phage lysates were filtered using a 0.45 µm membrane and the titer was estimated using a previously described double layer titration method [30]. Phage dilutions were performed in sterile SMG buffer. Filtered lysates were stored at 4 • C until further use.

Plaque Morphology Analysis
A double agar layer method was used in order to obtain single plaques of the twice plaque purified phages. After overnight incubation at 37 • C, the plaques were measured and the pictures were taken using Gene Genius Bio-imaging System (Syngene, Cambridge, UK).

Host Range Determination
72 strains of the ECOR collection, as well as E. coli O157:H7 ∆stx and S. sonnei 53G were used for host range analysis. 100 µL of the relevant overnight bacterial culture was added to 4 mL of molten LB soft agar and spread on LB plate. 10 µL of a given phage lysate with a minimum titer of 10 7 pfu/mL was spotted on the surface of each plate. The plates were left to dry and incubated at 37 • C. The plates were examined after overnight incubation for lysis zones. All assays were performed in, at least, triplicate and in cases where the zones of lysis were hazy or did not produce individual plaques, the infectious capability was evaluated using plaque assays. Markov Cluster Algorithm clustering (MCL) of the host range was performed using MeV (Multi Experiment Viewer; http://mev.tm4.org/) software.

Phage Morphology Assessment
In order to perform an electron microscopic evaluation of the isolated phages, they were purified using a discontinuous caesium chloride density gradient ultra-centrifugation [31] and dialyzed against phage buffer (20 mM Tris-HCl (pH 7.2), 10 mM NaCl, 20 mM MgSO 4 ) [32]. Adsorption of CsCl-purified phages to freshly prepared carbon film floated from a freshly coated mica sheet on a 400-mesh copper grid (Agar Scientific, Essex, UK) and negative staining with 2% (w/v) uranyl acetate were performed as described previously [32]. Specimens were examined with a Tecnai 10 transmission electron microscope (FEI, Eindhoven, The Netherlands) operated at an acceleration voltage of 80 kV.  [33,34]. The genomes were visualized and edited using Artemis Release 15.0.0 (http://www.sanger.ac.uk/science/tools/artemis) and nucleotide BLAST (BLASTn). The percentage of similarity between the phage proteins was acquired using protein BLAST (BLASTp).

Proteomic Analysis of JK16
Phage particles of JK16 phage were purified (see above) and phage proteins were concentrated using methanol-chloroform precipitation [32]. Phage proteins were separated using SDS-PAGE on a 12.5% polyacrylamide gel. The gel was stained with 0.25% Coomassie blue. Protein bands were excised from the gel, de-stained and the proteins were digested using trypsin-Gold [32]. The samples were analyzed using electrospray ionization-tandem mass spectrometry (ESI-MS/MS), as described previously [32].

Phage Isolation, Plaque Morphology and Host Range Determination
Forty one phages were isolated from water samples. Additionally, five phages were isolated from one sample of chicken meat using the prophage-free laboratory strains in the primary screen. The host range of these isolates was assessed employing the ECOR72 strain collection. A substantial number (twenty five) of phages displayed identical host range profiles to at least one of the other phages of the collection, and they were therefore not further investigated except for one representative phage of each of the 19 host range groups was selected for further characterization ( Table 1). The phages were also shown to exhibit a variety of plaque morphologies with the majority displaying medium sized plaques with a diameter of ≤ 3 mm. Four phages (JK27, JK28, CM1 and CM8) formed tiny, regular plaques of 0.5 mm diameter, while JK16 and JK42 formed large plaques (up to 4 mm diameter) with an obvious "halo" surrounding the plaque in the case of JK16. The plaque morphologies of individual phage isolates on susceptible strains were not observed to differ significantly in size or appearance and the typical morphologies are presented in Table 1 on the relevant isolation host strain.
The host range of the phage collection was ascertained on a panel of 73 E. coli strains as well as a single Shigella sonnei strain. Table 1 highlights that the phage isolates lyse 15-54% of the panel of strains in our collection. To assess if the host range profiles were unique/overlapping and if patterns of infection could be discerned, a heat-map of the infection profiles was generated. Based on the identification of distinct/overlapping host range profiles, the collection of nineteen phages was divided into three main groups ( Figure 1). The first cluster was represented by JK28, JK32, JK27, JK36 and JK38. Three phages of this cluster (i.e., JK27, JK28 and JK32) had been isolated from a waterfall stream, while two phages (JK36 and JK38) originated from sewage water. The second cluster is represented by seven phages which had been isolated from sewage water or a waterfall stream: JK42, JK35, JK33, JK25, JK29, JK40 and JK45. Two phages isolated from chicken meat (i.e., CM1 and CM8) make up the third cluster, together with environmental water isolates (JK23 and JK16), and sewage isolates (JK56 and JK58). Based on the heat-map presented in Figure 1, CM1 appears to be intermediate between the cluster 1 and 3 phages mentioned above. However, CM1 shares 15 host strains with cluster 3 phages and only eight shared hosts with those of cluster 1 phages. Therefore, based on host range it is grouped among the cluster 3 phages. Another sewage isolate, phage JK55, had a very different host range from all other phages and is thus considered distinct based on its host range ( Figure 1). Phage infectivity (i.e., number of infected strains divided by total number of tested strains x 100%) ranged from 14% (JK55) to 54% (JK23) ( Table 1). Altogether, the tested phages infected all but 11 ECOR strains ( Figure 1). All tested phages infected S. sonnei 53G (Figure 1). Table 1. Host range and plaque morphology of the 19 phages examined in this paper. Phages were isolated from various environments: natural springs, industrial waste water and chicken meat. The infectivity of the phages was calculated based on the host range ( Figure 1). Phages exhibited various plaque morphologies, varying from very small (less than 0.5 mm diameter) to large (up to 4 mm diameter). Plaques were formed on agar plates with E. coli host that was used to isolate the phages.

Phage
Source number (twenty five) of phages displayed identical host range profiles to at least one of the other phages of the collection, and they were therefore not further investigated except for one representative phage of each of the 19 host range groups was selected for further characterization ( Table 1). The phages were also shown to exhibit a variety of plaque morphologies with the majority displaying medium sized plaques with a diameter of ≤ 3 mm. Four phages (JK27, JK28, CM1 and CM8) formed tiny, regular plaques of 0.5 mm diameter, while JK16 and JK42 formed large plaques (up to 4 mm diameter) with an obvious "halo" surrounding the plaque in the case of JK16. The plaque morphologies of individual phage isolates on susceptible strains were not observed to differ significantly in size or appearance and the typical morphologies are presented in Table 1 on the relevant isolation host strain. Table 1. Host range and plaque morphology of the 19 phages examined in this paper. Phages were isolated from various environments: natural springs, industrial waste water and chicken meat. The infectivity of the phages was calculated based on the host range ( Figure 1). Phages exhibited various plaque morphologies, varying from very small (less than 0.5 mm diameter) to large (up to 4 mm diameter). Plaques were formed on agar plates with E. coli host that was used to isolate the phages. from one sample of chicken meat using the prophage-free laboratory strains in the primary screen. The host range of these isolates was assessed employing the ECOR72 strain collection. A substantial number (twenty five) of phages displayed identical host range profiles to at least one of the other phages of the collection, and they were therefore not further investigated except for one representative phage of each of the 19 host range groups was selected for further characterization ( Table 1). The phages were also shown to exhibit a variety of plaque morphologies with the majority displaying medium sized plaques with a diameter of ≤ 3 mm. Four phages (JK27, JK28, CM1 and CM8) formed tiny, regular plaques of 0.5 mm diameter, while JK16 and JK42 formed large plaques (up to 4 mm diameter) with an obvious "halo" surrounding the plaque in the case of JK16. The plaque morphologies of individual phage isolates on susceptible strains were not observed to differ significantly in size or appearance and the typical morphologies are presented in Table 1 on the relevant isolation host strain. Table 1. Host range and plaque morphology of the 19 phages examined in this paper. Phages were isolated from various environments: natural springs, industrial waste water and chicken meat. The infectivity of the phages was calculated based on the host range ( Figure 1). Phages exhibited various plaque morphologies, varying from very small (less than 0.5 mm diameter) to large (up to 4 mm diameter). Plaques were formed on agar plates with E. coli host that was used to isolate the phages. from one sample of chicken meat using the prophage-free laboratory strains in the primary screen. The host range of these isolates was assessed employing the ECOR72 strain collection. A substantial number (twenty five) of phages displayed identical host range profiles to at least one of the other phages of the collection, and they were therefore not further investigated except for one representative phage of each of the 19 host range groups was selected for further characterization ( Table 1). The phages were also shown to exhibit a variety of plaque morphologies with the majority displaying medium sized plaques with a diameter of ≤ 3 mm. Four phages (JK27, JK28, CM1 and CM8) formed tiny, regular plaques of 0.5 mm diameter, while JK16 and JK42 formed large plaques (up to 4 mm diameter) with an obvious "halo" surrounding the plaque in the case of JK16. The plaque morphologies of individual phage isolates on susceptible strains were not observed to differ significantly in size or appearance and the typical morphologies are presented in Table 1 on the relevant isolation host strain. Table 1. Host range and plaque morphology of the 19 phages examined in this paper. Phages were isolated from various environments: natural springs, industrial waste water and chicken meat. The infectivity of the phages was calculated based on the host range ( Figure 1). Phages exhibited various plaque morphologies, varying from very small (less than 0.5 mm diameter) to large (up to 4 mm diameter). Plaques were formed on agar plates with E. coli host that was used to isolate the phages. The host range of the phage collection was ascertained on a panel of 73 E. coli strains as well as a single Shigella sonnei strain. Table 1 highlights that the phage isolates lyse 15-54 % of the panel of strains in our collection. To assess if the host range profiles were unique/overlapping and if patterns of infection could be discerned, a heat-map of the infection profiles was generated. Based on the identification of distinct/overlapping host range profiles, the collection of nineteen phages was divided into three main groups (Figure 1). The first cluster was represented by JK28, JK32, JK27, JK36 0.5-1

CM8
Chicken meat BL21 53 The host range of the phage collection was ascertained on a panel of 73 E. coli strains as well as a single Shigella sonnei strain. Table 1 highlights that the phage isolates lyse 15-54 % of the panel of strains in our collection. To assess if the host range profiles were unique/overlapping and if patterns of infection could be discerned, a heat-map of the infection profiles was generated. Based on the identification of distinct/overlapping host range profiles, the collection of nineteen phages was divided into three main groups (Figure 1). The first cluster was represented by JK28, JK32, JK27, JK36 strains with cluster 3 phages and only eight shared hosts with those of cluster 1 phages. Therefore, based on host range it is grouped among the cluster 3 phages. Another sewage isolate, phage JK55, had a very different host range from all other phages and is thus considered distinct based on its host range (Figure 1). Phage infectivity (i.e., number of infected strains divided by total number of tested strains x 100 %) ranged from 14 % (JK55) to 54 % (JK23) ( Table 1). Altogether, the tested phages infected all but 11 ECOR strains (Figure 1). All tested phages infected S. sonnei 53G (Figure 1).

Phage Particle Morphology
Ten out of the nineteen selected phages (at least one phage representing each of the identified host range groups) were selected for particle morphology assessment by electron microscopy. The phages were observed to belong to one of two morphological groups [35] (Table 2). With the exception of JK16, all isolates belonged to the Myoviridae family, possessing long contractile tails [35]. JK23, JK32, JK36, JK38, JK42, JK45 and CM8 exhibited a T4-like morphology ( Figure 2) and displayed similar particle dimensions (Table 2). Phage JK36 possesses exceptionally long baseplate fibers, exceeding 160 nm in length (Figure 2, Table 2). Conversely, phage CM1 possesses smaller capsid and baseplate fibers compared to the other Myoviridae phage isolates ( Figure 2, Table 2). JK55 particles were damaged despite several attempts therefore, morphological data pertaining to this phage is not available ( Table 2).

CM8
Myoviridae tl 108 ± 1(9) tw 22 ± 1 (9) hl 114 ± 4 (9) hw 84 ± 2 (9) bpw 33 ± 2 (9) bps 17 ± 1 (9) fbf 126 ± 11 (8) The only Siphoviridae phage isolate observed in this study, JK16, had a very flexible, non-contractile tail with (at least) three long tail fibers with distal globular structures ( Figure 2). The capsid size varied from 63.7 to 64.7 nm, while the tail length exceeded 150 nm ( Table 2). The unique distal globular structures varied slightly in size depending on their localization-the size of the globular structure located beneath the distal tail end ranged from 9.0 ± 1.3 nm × 8.1 ± 0.9 nm (indicated with a black arrow in Figure 2), while similar structures present on short side fibers measured 11.1 ± 1.5 nm × 7.0 ± 1.1 nm (indicated with white arrows in Figure 2). The only Siphoviridae phage isolate observed in this study, JK16, had a very flexible, noncontractile tail with (at least) three long tail fibers with distal globular structures ( Figure 2). The capsid size varied from 63.7 to 64.7 nm, while the tail length exceeded 150 nm ( Table 2). The unique distal globular structures varied slightly in size depending on their localization-the size of the globular structure located beneath the distal tail end ranged from 9.0 ± 1.3 nm x 8.1 ± 0.9 nm (indicated with a black arrow in Figure 2), while similar structures present on short side fibers measured 11.1 ± 1.5 nm x 7.0 ± 1.1 nm (indicated with white arrows in Figure 2).

Identification of Phage Genetic Lineages
In order to investigate the genetic diversity of the isolated phages, the genomes of the ten selected isolates were sequenced and analyzed. All isolates possessed double-stranded DNA genomes with the majority displaying identity to previously described phages with six distinct genetic subgroups of phages identified based on the similarity to the closest database relatives acquired by BLASTn analysis of the whole genome sequences. Three subgroups of phage genomes related to the T-even group: (i) T4-even, (ii) RB69-even and (iii) pseudo-T-even. The fourth group, (iv) with an rV5-like genome, was represented by phage CM1. The phage with the narrowest host range, i.e., JK55, proved to be a close relative of Salmonella phage Felix O1 and was categorized in subgroup (v) -the Felix O1-like subgroup. JK16 (assigned to new subgroup vi) showed no resemblance to any broadly described phage group, and thus it was further investigated in more detail as will be described later.
Comparative whole genome analysis was performed between the subgroups but also within the subgroups. However a focused analysis of the receptor binding proteins (RBP) encoded by the phage isolates was also undertaken using BlastP analysis to reflect the diversity of host interactions as it is the among the most diverse genomic regions within the phage subgroups. In the case of T-

Identification of Phage Genetic Lineages
In order to investigate the genetic diversity of the isolated phages, the genomes of the ten selected isolates were sequenced and analyzed. All isolates possessed double-stranded DNA genomes with the majority displaying identity to previously described phages with six distinct genetic subgroups of phages identified based on the similarity to the closest database relatives acquired by BLASTn analysis of the whole genome sequences. Three subgroups of phage genomes related to the T-even group: (i) T4-even, (ii) RB69-even and (iii) pseudo-T-even. The fourth group, (iv) with an rV5-like genome, was represented by phage CM1. The phage with the narrowest host range, i.e., JK55, proved to be a close relative of Salmonella phage Felix O1 and was categorized in subgroup (v) -the Felix O1-like subgroup. JK16 (assigned to new subgroup vi) showed no resemblance to any broadly described phage group, and thus it was further investigated in more detail as will be described later.
Comparative whole genome analysis was performed between the subgroups but also within the subgroups. However a focused analysis of the receptor binding proteins (RBP) encoded by the phage isolates was also undertaken using BlastP analysis to reflect the diversity of host interactions as it is the among the most diverse genomic regions within the phage subgroups. In the case of T-even phages, the tail structural region and the receptor binding location have been described previously in detail [23], thus we identified the region based on the similarity to already known tail structural regions. For the rest of the phage groups, HHpred analysis was employed to assign potential RBP functions based on structural homology predictions [33].

T-Even Phages: T4, RB69 and Pseudo-T-Even (RB49-like) Subgroups
Seven of the sequenced phages showed similarity to T-even phages (Tables 1 and 3). The diversity of the T-even family is remarkable: comparisons between various isolates with distinct host ranges show a high degree of diversity in the hypervariable regions [36,37]. The patchwork-like genomes of these viruses consist of stretches of high variable regions around a conserved core [38]. As mentioned above, we differentiated three subgroups of our T-even phages -T4-even, RB69-T4-even and pseudo-T-even. Three out of ten sequenced phages (JK23, JK38 and CM8) showed over 80% nucleotide similarity to a strain of a T4 phage (Genbank accession no. KJ477684.1) [39], and over 90% nucleotide similarity to E.coli phage wV7 (Genbank accession no. HM997020.1). Therefore, these phages were classified among the T4-even subgroup. All phages possessed similar genome sizes of almost 170 kb, with a low G+C content (Table 3), which is typical for T4 phages [40,41]. Three phages -JK36, JK42 and JK45 -displayed high nucleotide similarity to phage RB69 (Genbank accession no. NC_004928.1). T4 and RB69 share approximately 80% orthologous genes showing more than 80% similarity [41]. These phages had similar G+C content, and a similar genome size (Table 3). JK32 possessed the largest genome, exceeding 176 kb, with an average G+C content of 40% (Table 3). BLASTn against the NCBI database showed similarity of this phage to phage RB49 (Genbank accession no. NC_005066.1), which has been classified as a "pseudo-T-even" phage [37]. Figure 3 illustrates the differences in the RBP region organization of the T-even phages.  Figure 3. Comparison of the tail fiber genes of the T-even phages; the percent values of the similarity between the structural proteins were acquired by BLASTp; blue arrow represents the gp34 proximal fiber, orange-gp35 tail fiber hinge, pink-gp36 small distal tail fiber subunit, green-gp37 large distal tail fiber subunit and purple-gp38 tail fiber adhesin. The dark blue arrow present in the pseudo-Tevens indicates the Dc5 ORF. Different tones of grayscale represent different range of protein similarity between the phages. Absence of shaded regions is indicative of ORFs with no shared similarity. All-against-all BlastP analysis was first undertaken to identify the isolates with the most similarity and they are placed in order of similarity in the figure above. In this comparison, both All-against-all BlastP analysis was first undertaken to identify the isolates with the most similarity and they are placed in order of similarity in the figure above. In this comparison, both phages isolated in this paper and reference phages (RB49, wV7 and RB69) are compared. Percentage similarity values presented indicate the similarity of the proteins between the two neighboring isolates.

rV5-Like Subgroup
The chicken meat isolate CM1 displayed no similarity to any of the T4 derivatives, while BLASTn analysis revealed similarity to the Enterobacteria phage vB_EcoM-FV3 (Genbank accession no. JQ031132.1) and E. coli bacteriophage rV5 (Genbank accession no. DQ832317.1). Based on this analysis, the phage was classified in "rV5-like" subgroup, which was recently defined [42]. CM1 has a genome size of nearly 140 kb, with a higher G+C content than all T4-related phages (Table 3). Phage rV5 is a derivative of phage V5, one of the phages employed in E. coli O157:H7 typing [43]. The genomics of rV5-like phages has been described previously [42,43]. HHpred analysis was employed to identify the likely RBP of CM1 [33] (Supplementary Table S1). The protein product of the downstream located gene, orf 40 (encoding a protein of 1266 aa), displays significant similarity to the L-shaped tail fibre protein of phage T5 based on HHpred analysis (Probability = 99.94, E-value = 1.7e-10) (Table S1). This tail fiber is predicted to contain receptor binding domains and is responsible for early host recognition, akin to GP37 in T4 phage [44]. The C-terminus of this orf 40-encoded protein, exhibits similarity to a chaperone of Enterobacteria phage K1F (Probability = 99.13, E-value = 1e-12) (Table S1), which is responsible for the correct folding of tail proteins [45]. Another domain located in the C-terminus showed high similarity (Probability = 99.25, E-value = 7e-14) to RBP of Salmonella phage vB_SenMS16. Therefore, we assume that receptor binding region is likely encoded by orf 40.

Felix O1-Like Subgroup
JK55 exhibits a relatively small genome (86,219 bp) compared to the isolates mentioned above, with 124 predicted ORFs (Table 3). This phage displays significant similarity to Felix O1 (98% similarity, 93% coverage), a phage mainly known for its use in typing Salmonella [46]. The genome characteristics of Felix O1 have been described previously [46]. HHpred analysis revealed that the product of orf 74 of JK55 phage exhibits similarity to T-even's gp37 (Probability = 99.43, E-value = 1.5e-15), a T-even RBP that was described in detail in previous sections. The orf 74 protein product is therefore proposed to represent the RBP of phage JK55.
The structural region encoding the capsid elements encompasses at least five genes (orf 58-62). Based on HHpred, Pfam and BlastP analysis, we suggest that orfs 58-61 encode head stabilization, scaffolding and/or decoration proteins while orf62 encodes the major capsid protein. This protein bears structural homology with the major capsid protein of the Pseudoalteromonas phage TW1 (100% probability, PDB5WK1_E). SDS-PAGE analysis of the structural proteome of JK16 identified three proteins of high abundance. One of these, which runs at approximately 28 kDa may represent the mature and cleaved version of the major capsid protein which is predicted to be 35.9 kDa in its' unprocessed form (Table 4 and Figure S1). There are three ORFs between the head and tail structural component-encoding regions (orf63-65) for which functions could not be readily assigned. However, HHpred and Pfam analysis suggests that orfs 64/ 65 bear some structural/sequence homology to head-tail connector proteins of the coliphage HK97 (77.2% PF05135.13) and Shigella flexneri (95.3% PDB entry 2K24_A). Ten genes can readily be associated with tail morphogenesis (orf 66, orf 68-75 and orf 84). Detailed analysis using HHpred revealed predicted specific functions of several of the structural gene products (Supplementary Table S3). For example, orfs67 and 69 are predicted to encode tail assembly chaperone proteins while orf68 likely acts as the major tail protein. Based on SDS-PAGE analysis of the structural proteome of JK16, there is a protein of high relative abundance of approximately 25 kDa, which is in keeping with the predicted mass (24.4 kDa) of Orf68 (Fig. S1). It is proposed that orf73 may function as the tail associated lysin with identifiable hydrolytic domains observed in this protein.
Finally, the product encoded by orf 75, which represents the largest protein of JK16 (1192 aa), exhibits similarity to various adhesion domains (data not shown), and also to a tail domain of bacteriophage MuSo2 (Table S3), and based on these similarities it is the most likely candidate to represent the RBP of this phage. Interestingly, orf 84, located a few genes downstream from the tail structural region, encodes a putative tail protein (Table S2). Interestingly, the SDS-PAGE profile of JK16 indicates the presence of a third highly abundant protein, which cannot be readily attributed a function based on our analysis. However, based on its size, we suggest that it is likely a capsid stabilization protein.  The structural region encoding the capsid elements encompasses at least five genes (orf58-62). Based on HHpred, Pfam and BlastP analysis, we suggest that orfs 58-61 encode head stabilization, scaffolding and/or decoration proteins while orf62 encodes the major capsid protein. This protein bears structural homology with the major capsid protein of the Pseudoalteromonas phage TW1 (100 % probability, PDB5WK1_E). SDS-PAGE analysis of the structural proteome of JK16 identified three proteins of high abundance. One of these, which runs at approximately 28 kDa may represent the mature and cleaved version of the major capsid protein which is predicted to be 35.9 kDa in its' unprocessed form (Table 4 and Figure S1). There are three ORFs between the head and tail structural component-encoding regions (orf63-65) for which functions could not be readily assigned. However, HHpred and Pfam analysis suggests that orfs 64/65 bear some structural/sequence homology to headtail connector proteins of the coliphage HK97 (77.2 % PF05135.13) and Shigella flexneri (95.3 % PDB entry 2K24_A). Ten genes can readily be associated with tail morphogenesis (orf66, orf68-75 and orf84). Detailed analysis using HHpred revealed predicted specific functions of several of the structural gene products (Supplementary Table S3). For example, orfs67 and 69 are predicted to encode tail assembly chaperone proteins while orf68 likely acts as the major tail protein. Based on The ESI-MS/MS was performed in order to recognize and characterize the JK16 proteins present/identified in purified virions. This analysis identified 16 JK16 structural proteins, which are presented in Table 4. Most of the predicted proteins were located within the structural region (tail and head proteins) or were associated with the DNA packaging module (portal proteins) ( Table 4). Almost all tail structural region proteins were detected by the ESI-MS/MS analysis, with the exception of the putative major tail protein (encoded by orf 68), tail assembly chaperone (orf 69) and tail assembly protein (orf 73), the latter two being likely non-structural components while it is unclear why the major tail protein was undetected in this study. This analysis identified 15 JK16 structural proteins, which are presented in Table 4. Most of the predicted proteins were head and tail proteins whose genes were located within the virion morphogenesis region of the genome, which is immediately downstream of the DNA packaging genes (Figure 4, Table 4). An additional protein, gp9 was identified by mass spectrometry, which is unlikely to be a virion structural protein due to its homology to phosphodiesterases. We infer that this protein co-purified with JK16 virions and was identified due to the sensitivity of the mass spectrometry analyses, however further analyses are required to determine the role of gp9 in the viral life cycle. It is also notable that the detection of this protein was at levels just above the threshold values for the number of detected peptides and the coverage level.
The complete proteome of JK16 was compared with protein sequences of its close relatives, Escherichia phage vB_Eco_swan01 and Shigella phage pSf-1 (Figure 4). There were only minor regions of significant divergence from Escherichia phage vB_Eco_swan01 including hypothetical proteins (orf18, orf30 and orf33), while elements within the structural module were also observed to exhibit divergence: orf60 (a putative HtjA), orf61 (scaffolding protein) and orf84 (tail structural protein).

Discussion
Bacteriophages are progressively being acknowledged as a potential tool to control pathogens, including Shigella ssp., both in food and in human infections [48][49][50][51][52]. It is a distinct advantage that phages can be relatively easily isolated from a range of environments. In this study we focused on bacteriophage isolation from diverse environments: industrial (sewage), commercial/animal (poultry) and natural (natural environmental water reservoirs). In each sample type a variety of phages were isolated, representing different (sub) groups. The most prevalent group of phages was the T4-even group, i.e., members of this group were present in all sample types.
Screening for individual phages is a demanding quest, as there is no quick selection for isolates which are different from each other. Here, we used a three-step procedure, which increases the chance of selecting distinctly different phage isolates. Firstly, we selected phages based on varying plaque morphologies, and secondly differentiation was made based on varying host ranges. Finally, ten of the phages were categorized into subgroups using genome sequence analysis. For selection for potential therapeutic application, the phages used in the cocktail should ideally be strictly lytic [9,52]. As all the phages were double plaque purified and propagated on prophage-free laboratory strains, the lysates can be considered pure and free from prophage contamination from the propagating strains. Phage isolation on prophage-free, well-described laboratory strains, and the advantages of such procedures have been described previously [53,54]. Therefore, we pursued this strategy in this study to ensure the most suitable approach to identify and propagate pure and single phage preparations to ensure that induced prophages would not contribute to the observed host ranges.
Host range is a major property for successful phage treatment, as it defines the therapeutic spectrum [10,55]. Phages within our collection showed various host range patterns and various infectivity, ranging from 14% to 54% of the tested strains ( Figure 1). The phage having the narrowest host range (JK55) proved to be a member of Felix O1 subgroup, which infects Salmonella strains. Thus, the phage collection might have potential use for a variety of applications, both in specific, single-origin infections and also complex infections caused by many different bacterial strains. The finding that our phage collection was able to infect our reference Shigella sonnei strain, as well as the majority of the ECOR library (Figure 1), suggests that these phages will be effective against various serotypes of Shigella ssp., ETEC strains, Salmonella strains (especially JK55 and CM1 phages) and potentially also more distant members of enterobacteria, such as Klebsiella [56].
Ten phages from the obtained collection were subjected to morphological and genomic analysis. Almost all identified phages belonged to the Myoviridae family, with JK16 being the only phage that was shown to belong to the Siphoviridae family. Genomic analysis revealed that seven out of ten phages showed close resemblance to T-even phages. These T-even-like phages were shown to display the highest divergence in their RBP region, which determines their host range, indeed being consistent with their experimentally determined unique host range profiles (Figures 1 and 3). The only Siphoviridae phage, JK16, was genetically and morphologically distinct to the rest of the isolated phages. One of the most closely related phages to JK16, vB_Eco_swan01, was previously suggested to represent a new bacteriophage species among the Tunavirinae [57] while SECphi27 was also identified as a likely Siphovirus based on identity to vB_Eco_swan01 [58]. However, these phages were not characterized beyond the level of their genomes. Here, we examined the morphology of the phage and confirmed its structural protein content by mass spectrometry defining this phage as a member of a novel group of that may present a useful addition to phage cocktails given their distinct morphology, genome characteristics and likely host interactions. No lysogeny-related functions, such as integrases or repressors, were observed in the genomes of the 19 sequenced isolates, which is of importance if such phage isolates are to be employed in phage-based therapies [15]. Importantly, we did not detect any virulence factors or antibiotic resistance genes. These genes are highly undesirable if phages are to be used for therapeutic purposes, as they can be transferred to the bacterial host and cause adverse effects of the therapy [15].
Phage diversity within a cocktail is highly desired, as distinct phages may interact with various bacterial receptors, thereby expanding the therapeutic spectrum. The wide variety of phage-recognized receptors of Gram-negative bacteria has previously been described [19,59]. Outer-OmpC is known to be an important protein receptor of T4-like phages, since absence of this receptor in the host results in decreased infectivity [59]. The Gp37 tail fiber protein of T4 phage possesses a histidine-rich, unique region, which is involved in host recognition [59]. T4 phages also recognize rough type (R-type) lipopolysaccharide (LPS) receptors, which are common in Shigella ssp. [59]. The R-type LPS receptors lack the so-called O-antigen, which is a high-variable region [21,59]. Therefore, the host range of R-type LPS recognizing phages is broader than those recognizing the smooth-type (S-type) LPS, which possesses this hypervariable region [21,59]. Felix O1 phages are LPS-specific [60]. As phage JK55, analyzed in this paper, showed a very narrow host range, we speculate that this particular phage has preference for the S-type LPS. It remains uncertain, how the novel JK16 phage binds to the host cell and which bacterial receptors it recognizes. Given its distinctive morphology and unusual genome organization, it is possible that this phage recognizes a different bacterial receptor. For instance, some of the phages are able to bind to distal part of bacterial flagella [59], and it seems to be quite a common binding site for enterobacteria-infecting Siphoviridae [19].
Owing to the genetic diversity, the strictly lytic nature of the isolates and broad spectrum of infection, we feel that this phage collection has therapeutic potential. It is aimed to develop and optimize a broad-range phage cocktail which can corroborate the concept of phage therapy against Shigella ssp. and ETEC infections. Further work is currently being undertaken to assess the antimicrobial properties of such a phage cocktail.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4915/11/10/899/s1, Figure S1: SDS-PAGE analysis of JK16 proteome, Table S1: Putative functions of CM1 tail structural proteins based on HHpred analysis, Table S2: Genes and predicted gene products (predicted by HMMER and BLASTp analysis) of the novel phage JK16 and S3, Table S3: Putative functions of a few of the JK16 tail structural proteins.