The emergence of bacteria resistant to all currently available antibiotics is, and has long been, managed by modifying existing drugs to re-gain effectiveness against resistant bacteria [1
]. Drug development, however, cannot keep up with bacterial evolution, as the environment is under a constant load of antimicrobials, causing the persistence and spread of resistance genes [2
]. The development of new antibiotic classes is expensive and often too futile, as a majority of promising candidates do not make it through the clinical testing processes. Therefore, new sources for antimicrobial drugs are in high demand [3
], such as lytic proteins produced by bacteriophages during the infection cycle that include virion-associated peptidoglycan hydrolases and endolysins in conjunction with holins that have already been used to some extent as antibacterials [4
Phage genomes also encode numerous proteins for which the functions and structures are still unknown. These proteins are annotated in the sequence databases as “hypothetical proteins with unknown function” (HPUFs). Based on previous studies, many HPUFs include small molecules with toxic properties (toxHPUFs) that phages utilize during the infection cycle to hamper the cellular functions and defense mechanisms of the bacterial host [6
]. Traditionally, toxic gene products have been detected by the inefficient transformation of bacterial cells with a plasmid carrying the toxin-encoding gene [6
]. In the plating efficiency assay, the number of transformants obtained with the HPUF-encoding genes is compared to the number of transformants obtained with control genes encoding known toxic and non-toxic proteins. In practice, the PCR-amplified HPUF genes are ligated into a plasmid vector and electroporated individually into host bacteria; the resulting transformants are enumerated. This approach includes several difficult-to-standardize steps that all cause day-to-day, experiment-to-experiment, and batch-to-batch variation, even when the toxic and non-toxic controls are included in every batch. The reproducibility can be improved by increasing the number of replicates; however, such an approach does not allow high throughput without robotics.
In the next-generation-sequencing (NGS)-based assay described in this study, most of these issues are avoided by transforming all the selected genes simultaneously as a pooled ligation mixture. Using NGS read coverages, the relative abundance of correctly ligated inserts can be determined for a pooled ligation mixture and for the plasmids isolated from the pooled transformants. We describe and validate here the NGS screening assay by carrying out screening with five known toxic and four non-toxic genes of phages fHe-Kpn01 [10
], ϕR1-RT [8
], and T4 [12
] and 32 HPUF-encoding genes of Escherichia
phage fHy-Eco03 in a comparative study with the plating assay [10
2. Materials and Methods
2.1. Bacterial Strains, Bacteriophages and Culture Media
The bacteriophages and bacterial strains used in this study are listed in Table 1
and Table S1
. Plasmid pU11L4 used for cloning in this study was isolated from E. coli
strain DH10B/pU11L4. The electrocompetent DH10B and DH5α cells were prepared as described previously [13
]. All bacterial strains used in this study were grown on LB agar (1.5%, w/v
) or in LB broth supplemented with 100 µg/mL ampicillin (Sigma-Aldrich, St. Louis, MO, USA) unless mentioned otherwise, at 35 °C or 37 °C. Transformed E. coli
DH10B cells were grown in SOC medium (0.5% yeast extract, 2% tryptone, 10 mM NaCl, 2.5 mM KCl, 10 mM MgCl2
, 10 mM MgSO4
, 20 mM glucose pH 7.0) and transformed DH5α cells were grown on M9t minimal medium (2.2 mM KH2
, 3.4 mM Na2
, 0.94 mM NH4
Cl, 0.86 mM NaCl, 0.2% (w/v
) tryptone, 2.0 mM MgSO4
, 0.10 mM CaCl2
and 3.0 × 10−3
mM vitamin B1).
2.2. Phage Isolation and Purification
fHy-Eco03 was isolated from a municipal sewage sample collected in Hyvinkää, Finland, using clinical E. coli
strain #5509 (Table 1
and Table S1
) as the host. Strain #5509 was used to propagate the phage. Phage lysates were produced from semiconfluent soft-agar plates as described elsewhere [17
]. The phage lysates were then concentrated, purified using ultracentrifugation with a glycerol step gradient [18
], and stored at 4 °C as described earlier [10
2.3. Electron Microscopy
Phage particles were sedimented by centrifugation for 2 h (16,000× g
at 4 °C) and resuspended in 0.1 M ammonium acetate (pH 7.2). Afterward, 200 mesh Formvar-coated copper grids were used to allow the phage particles to sediment for 1 min. Negative staining was done using 1% uranyl acetate at pH 4.2 (method modified from [19
]). A JEOL JEM1400 electron microscope, operated at 80 kV, and an Olympus Morada CCD camera were used to image the phage particles (Department of Virology, University of Helsinki, Helsinki, Finland).
2.4. Host Range Determination
The host range of fHy-Eco03 on 50 E. coli
strains (Table S1
) was determined by pipetting 10 μL droplets of serial dilutions of concentrated phage stocks on lawns of different bacterial strains prepared on LB agar plates, and the plates were incubated until the next day at 37 °C. The double-layer method was used to confirm positive droplet test results using phage preparations with appropriate dilutions.
Infection growth curves on phage-sensitive E. coli strains were performed as follows. An overnight bacterial culture was diluted 500-fold in fresh LB medium, and 180 μL aliquots were distributed into Bioscreen Honeycomb 2 plates wells (Growth Curves Ab Ltd., Helsinki, Finland), where they were mixed with 20 μL of different fHy-Eco03 phage stock dilutions. The phage stock and bacterial culture were mixed to achieve multiplicity of infection (MOI) values ranging between 0.5 and 500. The controls consisted of 180 μL of bacterial culture and 20 μL of fresh LB medium, or just 200 μL of fresh LB medium alone. The growth experiment was carried out at 37 °C using a Bioscreen C incubator (Growth Curves Ab Ltd., Helsinki, Finland) with continuous shaking. The OD600 of the cultures was measured every 45 min for up to 15 h. The averages were calculated from values obtained for the bacteria grown in, at minimum, triplicate wells.
2.5. Genome Sequencing and Analysis
Phage DNA was obtained from high-titer phage preparations as described earlier [17
], and sequenced at Eurofins Genomics. The next-generation sequencing DNA library (insert size of 625 ± 311) was paired-end sequenced using Illumina MiSeq sequencer (Illumina, San Diego, CA, USA) with a read length of 150 nucleotides. The A5-miseq integrated pipeline for de novo assembly of microbial genomes was used to assemble the genome sequence [20
]. The termini of the phage genome were identified using PhageTerm [21
], and confirmed by restriction digestions and comparisons to related phages. The orientation of the genome was arranged similarly to the sequences of closely related homologs as found in a nucleotide BLAST search. The genes were annotated with RAST software [22
] and validated manually, confirming also that the predicted genes were accompanied by a properly located ribosomal binding site. The Geneious Prime v 11.1.5 [23
] was used for visualization of the phage genome.
A protein BLAST against the non-redundant protein sequences database (release update from 6 February 2021) was performed for every predicted gene product and the two results with the lowest E-values were recorded (Table S2
). Furthermore, every gene product was analyzed using HHpred [24
], and the best hits with a probability above 50% and an E-value below 1 were recorded (Table S2
). The presence of tRNAs was investigated using tRNAscan-SE [25
]. In addition, ResFinder 3.1 [26
] and VirulenceFinder 2.0 [27
] software were used. Phylogenetic trees of complete phage genomes at the nucleotide level were constructed using VICTOR [28
]. The complete genome sequence with annotation was deposited in the NCBI nucleotide database (GenBank) under accession number MW602648.
The protein content of purified phages (as tryptic peptides) was analysed using liquid chromatography–tandem mass spectrometry (LC-MS/MS) at the Proteomics Unit, Institute of Biotechnology, University of Helsinki as described earlier [8
]. Calibrated tryptic peptide peaks were searched against the predicted tryptic peptides from the amino acid sequences of all, even the non-annotated, open reading frames (ORFs) in the genome of fHy-Eco03. The proteins identified by LC-MS/MS analysis as having two or more unique tryptic peptides and over 5% sequence coverage were annotated as phage (structural) proteins.
2.7. DNA Methods
Restriction digestions of purified fHy-Eco03 DNA were performed with restriction endonucleases NcoI, NotI, ScaI, SphI (Thermo Fisher Scientific, Waltham, MA, USA), AflII, EagI, Sau3AI, and SmaI (New England Biolabs, Ipswich, MA, USA) in appropriate digestion buffers.
All plasmid isolations were performed with a commercial Nucleobond Xtra Midi kit (MACHEREY-NAGEL, Düren, Germany) according to the protocol for high-copy number plasmids. Plasmid pU11L4 was double digested with restriction enzymes Not
I and Nco
I, or with Nhe
I and Not
I (Thermo Fisher Scientific, USA), if an internal NcoI
site was present in the sequence of the gene in question (Tables S3 and S4
). Toxic and non-toxic control genes (Table S3
) and the HPUF-encoding (Table S4
) genes of phage fHy-Eco03 were cloned as PCR-fragments into the multiple cloning site of plasmid pU11L4 (Figure S1
) with a three-molar excess of the insert.
The recombinant plasmids were transformed to electrocompetent E. coli
DH10B cells, having a transformation efficiency of approximately 109
CFU per µg of intact pU11L4 plasmid. The electroporation was performed in 0.2 mm cuvettes (Bulldog Bio, Portsmouth, NH, USA) by combining approximately 10 ng of the recombinant vector to 45 µL of electrocompetent DH10B cells. The pulse was given with a Gene Pulser™ apparatus (Bio-Rad Laboratories, Hercules, CA, USA), with the settings of 200 Ω, 25 uF and 2.5 kV. Transformed cells were suspended immediately in 950 µL of SOC medium and incubated at 35 °C for 1 h in slow rotation; afterwards, 50 µL was pipetted on LB ampicillin plates. Plates were incubated overnight at 37 °C. Transformations of the HPUF-encoding genes of fHy-Eco03 were done in batches of 4 to 6 genes with the g178
gene of phage φR1-RT as a non-toxic control in each batch [8
]. The relative CFUs were determined from triplicate platings of two biological replications per gene.
For the NGS screening assay, the individual fHy-Eco03 HPUF-encoding gene ligations were combined to pooled samples in two batches, containing all the HPUF-encoding genes, without any control genes. The pooled ligation mix samples were purified and concentrated with NucleoSpin® Gel and a PCR Clean-up kit (MACHEREY-NAGEL, Germany) according to instructions, and eluted in 20 µL; 1 µL was used for electroporation as described above. After the initial 1 hr incubation in SOC, the whole suspension was spread in 50 µL aliquots on twenty LB ampicillin plates and grown overnight at 37 °C. All the obtained transformant colonies were pooled together, and the pool was grown aerated to the late logarithmic phase in ampicillin supplemented SOC (3 h at 37 °C). Plasmid isolation from the culture was carried out as described above using the Nucleobond Xtra Midi kit.
The NGS-based screening approach is outlined in Figure 1
. The ligation mixture and transformant samples containing plasmid DNA were sequenced with the 150 bp paired-end protocol in the Illumina HiSeq platform at NovoGene (UK). Successful ligation between the HPUF-gene carrying PCR-fragment and the plasmid vector generates ligation joints that in NGS will be sequenced from both strands, resulting in four kinds of sequence reads over the ligation joints (Figure 1
The raw sequencing read data were screened for the presence of the four expected sequences for each gene. To identify these reads, we generated in silico a list of ligation-joint covering sequences that included 15–25 nucleotides of specific sequence from both sides of the ligation joint (Figure 1
b). The number of reads over the joints (joint-reads) reflects the number of intact and correctly ligated genes in the samples. The relative number of joint-reads of a specific gene was then calculated as a percentage of the total joint-reads in the sample. A toxic gene can be identified by a significant reduction in the relative number of the gene-specific joint-reads in the transformant plasmid pool compared to the relative number in the pooled ligation mixture. The complete pipeline of the bioinformatics analysis is described in detail in Table S5
. The sequence analysis was performed using the Puhti computer at CSC (the Finnish Centre for Scientific Computing).
2.9. Protein Function and Sequence Analysis
The predicted functions and structures of protein sequences were obtained by modeling the protein sequence with Phyre 2 software [30
] and aligning against protein sequence databases with BLASTx [31
2.10. Confirmation of Toxicity by Growth Curve Analysis
The putative toxHPUF-encoding genes were selected to further confirm the toxicity. Genes were cloned into the multiple cloning site of the arabinose-inducible plasmid pBAD30. Plasmid and genes were digested with the KpnI and XbaI enzymes or SphI (Thermo Fisher Scientific, Waltham, MA, USA) if an internal Xbal site was present in the insert. An aliquot of each ligation mixture was transformed into electrocompetent E. coli DH5α cells as described earlier. Transformant colonies were selected and further grown overnight in an LB medium supplemented with ampicillin and glucose (0.2% w/v).
The constructs were confirmed by Sanger sequencing at the Finnish Institute for Molecular Medicine (FIMM). After overnight incubation, the cells were washed with M9t minimal medium and 10 µL was inoculated into the M9t medium supplemented with ampicillin and either glucose (0.2% w/v) or arabinose (0.2% w/v). Bacterial cells were distributed to Bioscreen Honeycomb plates and the OD600 was measured every hour for 20 h with Bioscreen C MBR (Oy Growth Curves Ab Ltd., Helsinki, Finland). Average ODs and standard deviation were measured from triplicate wells of three biological replicates per gene. As controls, E. coli strains carrying plasmids containing phage φR1-RT toxic or non-toxic (g137 and g150) genes, or with the empty vector were used.
In this study, we describe a new NGS-based screening assay for the detection of bacteriophage-encoded toxic proteins. The performance of the NGS screening assay was compared with the previously used plating assay by screening known toxic and non-toxic genes and the HPUFs of phage fHy-Eco03. Among the fHy-Eco03 HPUFs, the Gp05 was found to be toxic in both the screening assays and the growth inhibition assay. The amino acid sequence of Gp05 showed 68–78% identity to five HPUFs of Salmonella
phages. Despite fHy-Eco03 being isolated in E. coli
bacteria are closely related species; indeed, it is not surprising that the structure and function of Gp05 and the obtained proteins were similar. The N-terminal half of Gp05 was best modeled by Phyre2 against the fold present in RecG, an ATP-dependent helicase. The RecG protein was first described for E. coli.
It functions in resolving stalled replication forks, and RecG homologs can be found in a majority of bacterial species. RecG is an essential, multifunctional protein involved in several DNA repair and replication-related pathways, thus it would be a good target for the bacteriophage-mediated arrest of cellular functions [36
]. It is tempting to speculate that the Gp05 toxicity could be mediated by its ability to interfere the RecG activity. To further characterize the structure, mechanisms, and cellular targets of Gp05, comprehensive protein–protein interaction studies are required. This involves expression and purification of the toxic protein, which is often challenging because of possible toxic and deleterious effects to the bacterial production host. In such a case, as an alternative, yeast and plant cells could be used as the production host, or the toxic properties and targets could be identified through an in vitro translation approach.
Hypothetical proteins have previously been screened with interaction studies between known bacterial targets and hypothetical proteins [11
], or by cloning shotgun sheared genomes [9
] and annotated hypotheticals [6
] into inducible expression vectors. Van Den Bossche et al. discovered eight bacteriotoxic proteins by screening 32 hypothetical proteins of seven different phages against cellular targets of Pseudomonas aeruginosa
by utilizing affinity chromatography [11
]. However, since toxic proteins are discovered through interactions with specific targets, proteins with unknown cellular targets are inevitably overlooked. In a study conducted by Singh et al., seven Siphovirus genomes were sheared and cloned to inducible vectors and screened for toxic effects in two Mycobacterium
]. Two toxic proteins and two synthetic peptides were discovered with this approach. The peptides, however, had unnatural ORFs, pointing out the disadvantage of this method. By using sheared genomes, the risk of producing vector constructs with either fragmented ORFs or incorrectly oriented genes increases and these can easily be interpreted as false positives or negatives. A similar expression vector-based approach was used by Liu et al. with the difference of first identifying true hypotheticals for screening. With this method, 31 protein families with toxic properties were identified from 27 different Staphylococcus
The plating assay used in the present study as a reference method is similar to the approach used by Liu et al. [6
]. Since many PPAPs are not identified by bioinformatics tools, we first identified them by LC-MS/MS analysis carried out on purified phage particles, and only then, the remaining HPUFs were screened for toxic ones using the plating efficiency assay [8
]. The HPUF-encoding genes are cloned to an expression vector and transformed into electrocompetent E. coli
cells by electroporation. Toxic properties were identified by comparing transformation efficiencies against cells transformed with a vector containing a non-toxic control gene. Although this method minimizes the amount of genes to be screened, it still has several drawbacks that reduce the reliability and repeatability of the results, as many steps of the screening approach are hard if not impossible to control and standardize. Firstly, surviving transformants can be generated by undigested or self-ligated plasmids, or by gene fragments that are incorrectly ligated to the vector plasmid. Secondly, electroporation conditions in which the cells are transformed cannot be reliably controlled, thus are not ideal for quantitative experiments such as this. Finally, the results are affected by variations in temperature, pipetting, and plating. In order to increase the reliability and reproducibility, the HPUFs had to be screened with several replicate batches together with control genes. While this did decrease the variation, the required amount of resources and time increased almost exponentially.
The performance of the NGS screening assay was first tested by conducting an experiment with five non-toxic and five toxic genes of phages fHe-Kpn01 [10
], T4 [39
], and φR1-RT [8
]. The plating efficiency assay confirmed that RegB of T4, Gp137 of φR1-RT, and Gp10, Gp22, and Gp38 of fHe-Kpn01 were toxic. In the parallel NGS-based screening assay, however, only Gp10 and Gp22 showed toxicity. The regB
ligation, due to an unknown technical error, dropped out from the pooled ligation mixture, thus leaving the unexpected results of Gp38 and Gp137 unaccounted for. The possibility of co-transformation of an anti-toxin expressing gene was already discussed for Gp137 (Section 3.4
), and this could also be the case for Gp38. As a conclusion, it has to be noted that the interpretation of the NGS-based assay results may face unknown challenges. In the following NGS-based assays, we screened the HPUF-encoding genes of phage fHy-Eco03. The results of the plating efficiency and the NGS-based screening assays agreed to great extent (Table 3
), and the same gene products presented toxic and non-toxic effects in both assays, with g05
identified as most toxic in both assays (Table 3
), as well as later in the growth inhibition assay (Figure 4
). Despite grossly similar results, the variation between replicates in the NGS screening assay was substantially lower compared to that of the plating assay (Table S9
The NGS-based screening assay was designed to increase reliability and significantly reduce the time and resources required for the hands-on lab work and further analysis of the results. This was enabled by performing the lab work and bioinformatics analysis of the preliminary screening simultaneously for all genes. As seen from the results, variation between the obtained sequence coverages of replicate transformations and even biological replicates was very small.
In conclusion, we have introduced here the NGS-based efficient screening assay for the identification of toxHPUFs. The performance of the assay still relies on very careful laboratory work when preparing the plasmid vector and the PCR-amplified DNA-fragments for ligations.