Comparative Genomics and Characterization of the Late Promoter pR’ from Shiga Toxin Prophages in Escherichia coli

Shiga-toxin producing Escherichia coli (STEC) causes human illness ranging from mild diarrhea to death. The bacteriophage encoded stx genes are located in the late transcription region, downstream of the antiterminator Q. The transcription of the stx genes is directly under the control of the late promoter pR’, thus the sequence diversity of the region between Q and stx, here termed the pR’ region, may affect Stx toxin production. Here, we compared the gene structure of the pR’ region and the stx subtypes of nineteen STECs. The sequence alignment and phylogenetic analysis suggested that the pR’ region tends to be more heterogeneous than the promoter itself, even if the prophages harbor the same stx subtype. Furthermore, we established and validated transcriptional fusions of the pR’ region to the DsRed reporter gene using mitomycin C (MMC) induction. Finally, these constructs were transformed into native and non-native strains and examined with flow cytometry. The results showed that induction levels changed when pR’ regions were placed under different regulatory systems. Moreover, not every stx gene could be induced in its native host bacteria. In addition to the functional genes, the diversity of the pR’ region plays an important role in determining the level of toxin induction.


Introduction
Bacteriophages shape the genome of their prey through horizontal gene transfer, often transferring genes that provide an evolutionary benefit for both the bacterial host and the prophage. There are several examples of this phenomenon in Escherichia coli including phages that transfer genes into E. coli that confer virulence, or improve its ability to survive environmental stress [1][2][3][4]. One such group of genes are the stx genes that make E. coli toxic to some protist predators, but also convert commensal E. coli to human pathogens [5][6][7][8].

Bacterial Strains and Culture Conditions
The STEC strains used in this study are listed in Table 1 [32]. Strain E. coli O104:H4 strain 11-3088 ∆stx::gfp::amp r was used as the reporter strain for DsRed expression; this strain is a derivative of the outbreak strain E. coli O104:H4 that was obtained by the replacement of stx2a by a gfp::amp r cassette [33]. E. coli were routinely grown in Luria-Bertani (LB) medium (BD, Fisher Scientific, Edmonton, CA, USA), at 37 • C with agitation at 200 rpm, or on LB agar plates with 1.5% agar (BD, Fisher Scientific). Ampicillin (50 g/L) and chloramphenicol (100 g/L) were added when required for plasmid maintenance.

Sequence Analysis and Phylogenetic Trees
For scaffolding the contigs and pairing, the contig(s) ( Table 1) containing stx were retrieved and reference strains with a closed genome were determined by Nucleotide BLAST on the National Center for Biotechnology Information (NCBI) (https://blast.ncbi.nlm.nih.gov/Blast.cgi). To obtain the complete sequence of the target segment, reference genome sequences were downloaded from the NCBI nucleotide database and contigs were manually aligned with the references and assembled into a larger segment in Geneious (Biomatters, Auckland, New Zealand). Gaps between contigs were filled by Sanger sequencing.
Sequence alignment and phylogenetic analysis of the pR' regions and stx genes were generated by Geneious. To generate the phylogenetic trees, sequences of the pR' region were first aligned using MUSCLE [35]. Results of the alignment were used to build the tree. The stx from Shigella dysenteriea type 1 strain Sd197 (accession number: NC_007606) was included as the outgroup. Parameters "Tree build Method" and "Resampling Method" were set as "Neighbor-Joining" and "Bootstrap", respectively, while the rest of the parameters were set to default values.

Nomenclature of Promotor Constructs
The pR' region was determined as the region starting from the last 42 bp of the Q protein and ending by the first 39 bp of the stx to make sure that the pR' from all candidate strains could be included. Plasmids containing the different pR' were named as Pp, followed by the strain number of the Food Microbiology culture collection at the University of Alberta (FUA number). For example, the pUC19 derived plasmid containing the pR' fragment from E. coli was termed Pp1302. Plasmids containing the pR' region from strains with more than one stx gene were denoted by Pp, followed by the FUA number and the abbreviation of the stx subtype. For example, the plasmids containing one of the two pR' fragments from E. coli FUA1303 were denoted as Pp1303-1 and Pp1303-2a, respectively. Plasmids containing promotor regions from E. coli FUA1399, which harbors two stx2a genes, were denoted by the FUA number and the contig number, which were Pp1399-28 and Pp1399-79. To construct the pR'::rfp::chl r fusion reporter system, fragments pR', rfp, and chl r were amplified from candidate STEC strains, plasmid pDsRed (Clontech, Mountain View, CA, USA), and plasmid pKD3 [36], respectively. Three fragments were ligated together and transformed into the vector pUC19. The plasmids and primers used are listed in Tables 1 and 2.   Table 2. Primers used for obtaining pR' and rfp fragments.
To validate the fluorescence gene fusion reporter system, DsRed expression by strains harboring the reporter constructs was visualized by fluorescent microscopy under the Axio Imager microscope (Carl Zeiss Canada Ltd., Toronto, ON, Canada). Image acquisition was performed with multi-channel fluorescence imaging with filters for Rhodamine (red fluorescence) and GFP. Cells were grown in LB with a 0.

Determination of the Treatment Conditions for Flow Cytometry Detection
To prevent cell lysis prior to analysis by flow cytometry without interfering with the folding of DsRed, a time course experiment of heat inactivation was performed. The heating was performed at a time when DsRed was produced, but before the expression of phage genes resulted in cell lysis. Cells were induced with MMC (0.5 g/L) when OD 600 reached 0.4~0.6 (exponential phase), further incubated for 3 h, and sampled every 30 min. Samples were heated to 60 • C for 5 min, resulting in cell inactivation but not cell lysis [37], and incubated at 4 • C for 27 h, 37 • C for 7 h, or 37 • C for 27 h.
A LSRFortessa™ X-20 cell analyzer (Biosciences, Mississauga, ON, Canada) was used to perform the cell analysis. Fluorescence was excited with a 488 nm Argon ion laser and followed by a 530/30-575/26 nm bandpass filters, and finally detected by side scatter detectors and a forward scatter detector. To adjust the detected cell number per second (e/s) between 300~3000 e/s, samples were resuspended and diluted between 1:100 and 20:100 with 1 mL 1× PBS (pH 7.4). Data was recorded by BD FACSDIVA TM software (BD Biosciences, San Jose, CA, USA) and analyzed by FlowJo (BD Biosciences, San Jose, CA, USA) ( Figure 1). The single cell population was defined by selecting the cell population located along the diagonal of the "FSC-A; FSC-H" dot plot, and "cells of favorite" was set as 100% of the singlets in the "FSC-A; SSC-A" dot plot. The gating strategy for the flow cytometric analysis is shown in Figure 1.   To evaluate the induction efficiency, exponential phase cultures were inducted by MMC (0.5 g/L), heat inactivated 4.5 h after induction, and measured by flow cytometer 27 h after induction (22.5 h after heating inactivation). The method used for the detection of the fluorescent cell population was the same as described above.

Statistical Analysis
The experiments were repeated at least three separate times (biological replicates). Statistical analysis was performed with SigmaPlot (v.12.5., Systat Software Inc., London, UK) using one-way analysis of variance (ANOVA). A p-value of ≤0.05 was considered statistically significant.

Sequence Alignment and Phylogenetic Analysis
Previous studies have demonstrated the mosaic nature of stx phages [30,38]. In this study, a phylogenetic analysis was performed to compare the pR' region and stx to determine whether the phylogeny of stx corresponded to the phylogeny of the pR' region that controls stx and prophage expression ( Figure 2). The stx genes of the same subtype were located in the same clade ( Figure 2A); stx1 and stx1c were located in two separate clades where genes belonging to the stx2 subtypes were all in the same branch. The phylogeny of pR' regions was more heterogeneous ( Figure 2B) and did not match the phylogeny of the corresponding stx.
The late promoter region, which includes the pR' promoter, is directly upstream of stx and downstream of Q [39]. To assess the sequence diversity, 26 sequences of the pR' region were aligned ( Figure 3). The comparison of the pR' regions confirmed that the sequences of pR' regions were highly divergent even if they regulated the same stx subtype (Figure 3). Most of the sequence differences in the pR' regions were caused by single nucleotide changes and not the insertion of a whole flanking region, which suggested the possibility of functional diversity during phage induction [40]. Several pR' regions including p1402, p1309-2d, p1310-2d, p1306, and p1399-28 lacked the pR' site that was identified in highly virulent strains (Acc. No. AP000400) [41]. In order to determine the effect of the pR' region on phage induction levels, we selected nine prophages with diverse sequences of stx and the pR' region for subsequent analyses excluding closely related sequences.   (Table 1). Neighbor-Joining trees were generated in Geneious using the Tamura-Nei model. The reliabilities of the internal branches were assessed using bootstrapping with 1000 pseudo-replicates. The scale bars represent the number of the substitution per site. Bootstrap values over 70% are displayed. Shigella dysenteriea type 1 strain Sd197 was included as the outgroup. Strains that had significant phylogenetic differences between the pR' region and stx gene are highlighted by dots and were used in downstream studies. (A) Phylogenetic tree generated by comparing the stx genes, which included both subunit A and B. (B) Phylogenetic tree generated by aligning the pR' region located between Q and stx.  Sequence identities are colored in green, yellow, and red, which indicate that the residue at that position is the same across all sequences, less than complete identity and very low identity, respectively. The schematic stx genes were annotated behind the pR' regions. The sequences that did not have the same pR' site as the reference are shaded. The figure is provided in high resolution for large scale printing or viewing.

Construction and Validation of the pR'::rfp::chl r Transcriptional Fusion
To determine the role of the pR' region in stx expression, we amplified the pR' fragments from 16 strains by PCR and ligated the pR' fragments into the plasmid pUC19, respectively. The DsRed reporter protein and the antibiotic resistance gene chl r was introduced into the vector, downstream of the pR' region. The resulting plasmid is depicted in Figure 4 (schematic rings). Sequence identities are colored in green, yellow, and red, which indicate that the residue at that position is the same across all sequences, less than complete identity and very low identity, respectively. The schematic stx genes were annotated behind the pR' regions. The sequences that did not have the same pR' site as the reference are shaded. The figure is provided in high resolution for large scale printing or viewing.

Construction and Validation of the pR'::rfp::chl r Transcriptional Fusion
To determine the role of the pR' region in stx expression, we amplified the pR' fragments from 16 strains by PCR and ligated the pR' fragments into the plasmid pUC19, respectively. The DsRed reporter protein and the antibiotic resistance gene chl r was introduced into the vector, downstream of the pR' region. The resulting plasmid is depicted in Figure 4 (schematic rings). To determine the role of the pR' region in stx expression, we amplified the pR' fragments from 16 strains by PCR and ligated the pR' fragments into the plasmid pUC19, respectively. The DsRed reporter protein and the antibiotic resistance gene chl r was introduced into the vector, downstream of the pR' region. The resulting plasmid is depicted in Figure 4 (schematic rings).  To validate the pR'::rfp::chl r transcriptional fusion, E. coli O104:H4 11-3088 ∆stx::gfp::amp r (Pp1302::rfp::chl r ) and E. coli O157:H7 CO6CE900 (Pp1302::rfp::chl r ) were induced by 0.5 g/L MMC for 4.5 h ( Figure 5). E. coli O104:H4 11-3088 ∆stx::gfp::amp r was used as the negative control. In this strain, stx was replaced by gfp to visualize protein expression by fluorescence microscopy or flow cytometry [33]. In the absence of the pR' construct, only GFP positives could be observed after induction, whereas RFP positives were only detected in the target strain carrying a pR'::rfp construct. Moreover, E. coli O104:H4 11-3088 ∆stx::gfp::amp r (Pp1302::rfp::chl r ) showed both GFP and RFP positive cells, which demonstrated that the expression of the chromosomal gfp and the plasmid rfp were not affected by each other (p ≥ 0.05). stx was replaced by gfp to visualize protein expression by fluorescence microscopy or flow cytometry [33]. In the absence of the pR' construct, only GFP positives could be observed after induction, whereas RFP positives were only detected in the target strain carrying a pR'::rfp construct. Moreover, E. coli O104:H4 11-3088 Δstx::gfp::amp r (Pp1302::rfp::chl r ) showed both GFP and RFP positive cells, which demonstrated that the expression of the chromosomal gfp and the plasmid rfp were not affected by each other (p ≥ 0.05).

Detection of Stx Induction Levels in STEC Populations
Since stx is located in the late lytic region [42], Stx induction also induces the lytic cycle and eventually results in cell lysis, which obscures the detection of cells by flow cytometry. Thus, cultures were inactivated with heat 4.5 h after MMC induction, followed by incubated at 37 °C for 22.5 h. This protocol enabled the quantification of the proportion of cells expressing GFP or DsRed, or both, by flow cytometry (Figure 1).
To determine the impact of the diversity of the pR' region, we selected 16 transformants that represented various combinations of the pR' and regulatory regions, and measured the induction levels in the presence and absence of the MMC with flow cytometry. Initially, we measured the induction level in seven E. coli O104:H4 11-3088 Δstx::gfp::amp r (PpR'::rfp::chl r ) transformants. Under the control of regulatory proteins of the E. coli O104:H4 11-3088 prophage, transformants carrying the constructs p1302::rfp::chl r , p1303-2a::rfp::chl r , p1399-28::rfp::chl r , and p1399-79::rfp::chl r showed higher

Detection of Stx Induction Levels in STEC Populations
Since stx is located in the late lytic region [42], Stx induction also induces the lytic cycle and eventually results in cell lysis, which obscures the detection of cells by flow cytometry. Thus, cultures were inactivated with heat 4.5 h after MMC induction, followed by incubated at 37 • C for 22.5 h. This protocol enabled the quantification of the proportion of cells expressing GFP or DsRed, or both, by flow cytometry (Figure 1).
To determine the impact of the diversity of the pR' region, we selected 16 transformants that represented various combinations of the pR' and regulatory regions, and measured the induction levels in the presence and absence of the MMC with flow cytometry. Initially, we measured the induction level in seven E. coli O104:H4 11-3088 ∆stx::gfp::amp r (PpR'::rfp::chl r ) transformants. Under the control of regulatory proteins of the E. coli O104:H4 11-3088 prophage, transformants carrying the constructs p1302::rfp::chl r , p1303-2a::rfp::chl r , p1399-28::rfp::chl r , and p1399-79::rfp::chl r showed higher DsRed expression; other transformants did not express DsRed ( Figure 6A). GFP expression among the transformants was not different ( Figure 6B) (p ≥ 0.05), indicating that expression of the chromosomal gfp was not influenced by the plasmid-encoded heterologous pR' region. DsRed expression; other transformants did not express DsRed ( Figure 6A). GFP expression among the transformants was not different ( Figure 6B) (p ≥ 0.05), indicating that expression of the chromosomal gfp was not influenced by the plasmid-encoded heterologous pR' region.
Percentage of cell population To investigate the behavior of the pR' region under the control of its parent prophage, we measured the induction level of eight transformants: E. coli FUA1303 (Pp1303-1::rfp::chl r ), E. coli To investigate the behavior of the pR' region under the control of its parent prophage, we measured the induction level of eight transformants: E. coli FUA1303 (Pp1303-1::rfp::chl r ), E. coli FUA1303 (Pp1303-2a::rfp::chl r ), E. coli FUA1306 (Pp1306::rfp::chl r ), E. coli FUA1309 (Pp1309-1c::rfp::chl r ) and E. coli FUA1309 (Pp1309-2d::rfp::chl r ), E. coli FUA1311 (Pp1311::rfp::chl r ), E. coli FUA1399 (Pp1399-28::rfp::chl r ), and E. coli FUA1399 (Pp1399-79::rfp::chl r ) (Figure 7). To determine the induction behavior resulting from the combination of the same pR' and different regulatory regions, we transformed p1302::rfp::chl r into six different strains (Figure 7). We examined the induction levels in E. coli FUA1303, E. coli FUA1309, and E. coli FUA1399, which carry two prophages in their chromosome. The percentage of RFP positives revealed that not all of the prophages can be induced by MMC: Pp1303-1::rfp::chl r and Pp1399-28::rfp::chl r were not induced; in E. coli FUA 1309, both Pp1309-1c::rfp::chl r and Pp1309-2d::rfp::chl r were uninduced. We also compared the induction level of the p1302::rfp::chl r in different STECs and found significant differences among the six transformants. The pR' promoter region from 1302 was regulated differently by different strains, in E. coli FUA1303, E. coli FUA1311, and E. coli FUA1399, the induction level of Pp1302::rfp::chl r was comparable to its native strain; while in E. coli FUA 1309, the expression was lower (p ≤ 0.05). Additionally, the percentage of fluorescent cells in E. coli FUA1306 and E. coli FUA1311 with the heterologous promoter Pp1302::rfp::chl r was higher than the expression of the same protein under control of the homologous promoter in E. coli FUA1306 (Pp1306::rfp::chl r ) and E. coli FUA1311 (Pp1311::rfp::chl r ) (p ≤ 0.05). Finally, the induction levels among Pp1302::rfp::chl r , Pp1309-1c::rfp::chl r , and Pp1309-2d::rfp::chl r were not different when under the control of the prophages from E. coli FUA 1309 (p ≥ 0.05). Taken together, these data demonstrate that the sequence diversity of pR' as well as prophage-encoded regulatory proteins resulted in a concomitant diversity of expression levels.

Discussion
STEC genomes have a high degree of sequence diversity [26,[43][44][45] and different STECs differ in their virulence with disease symptoms ranging from mild diarrhea to hemolytic-uremic syndrome To determine the effect of the native regulator to the pR' region, the pR'::rfp::chl r constructs were cloned from the target strains and transformed back into their parent strains. To determine whether the same pR' region was differentially expressed in different strains, the construct p1302::rfp::chl r was transformed into all target strains and its parent strain E. coli FUA1302 O104:H4. Transformants were induced with MMC. Bars are grouped by the six target strains, the bars represent different pR' constructs shown in the figure legend. Bars with the same pattern that do not share a common letter differed significantly. The percentage of fluorescent cells are shown as mean ± standard deviations of quadruplicate independent experiments (p ≤ 0.05).

Discussion
STEC genomes have a high degree of sequence diversity [26,[43][44][45] and different STECs differ in their virulence with disease symptoms ranging from mild diarrhea to hemolytic-uremic syndrome leading to death [44]. Sequence diversity in the early regulatory region directly affects stx expression and toxin production [46][47][48], and accounts for differences in virulence. The present study provides evidence that sequence diversity in the late promoter region also contributes to different Stx expression in STEC. As Stx prophages not only confer virulence to STEC, but also convert commensal E. coli to pathogens [49,50], differences in the expression of late phage genes likely results in different degrees of virulence of different strains.
Sequence analysis of the pR' region revealed the presence of a great number of nucleotide differences. Of the two promoters upstream of stx, the distal promoter pR' controls Stx production [20]. To investigate the genetic relationship between pR' and stx, we conducted a phylogenetic analysis for these two sequences. The stx were highly conserved within the stx subtypes, whereas the pR' regions, whose stx are from the same subtype, are distinct from each other (Figure 3). This is in agreement with previous studies where the late gene region of Shiga phages exhibits considerable genetic diversity [30,42] and the emergence of the STECs in E. coli cannot be predicted through the serotypes [51].
Induction efficiency is positively correlated to Stx production and pathogenicity [44,52,53]. To determine the effect of the diversity in the late promoter region on the behavior of STECs, we transformed pR'::rfp::chl r constructs with representative promoter sequence structures into different target strains and quantified gene expression with fluorescent reporter proteins. Bacterial behavior is commonly assessed in bulk [51,52]. To include the stochastic switching during detection [54], we employed flow cytometry to allow the efficient measurement at a single-cell level [33,34]. As one of the most commonly used inducers, MMC was chosen to induce cultures in this research. However, lambdoid phages show different induction efficiency in response to different induction agents [52]. Thus, it is possible that the efficiency of induction may change under the treatment of other induction agents.
The use of pR' from seven different Stx prophages to control DsRed expression in E. coli O104:H4 11-3088 ∆stx::gfp::amp r demonstrated that the sequence diversity of the pR' region corresponded to different levels of gene expression. E. coli O157:H7 harboring stx2 under the control of Q 21 rather than Q 933 may exhibit a Stx2-negative phenotype [55]. The present study confirmed that prophage encoded regulatory proteins impact Stx expression as the same construct showed different expression levels in different strains. However, prophages in E. coli FUA1302 and E. coli FUA1311 both harbored the typical pR' site [41] and the highly conserved Q 933 [23]. Induction efficiencies of Pp1302::rfp::chl r and Pp1311::rfp::chl r were different under the control of the E. coli FUA1302 prophage. We thus propose that the Q and pR' sites are not the only determinants of induction efficiency of the late transcript region; sequence diversity in the late promoter region pR' [26] also regulates induction efficiency. Moreover, the similar GFP populations among samples indicates that the expression of the plasmid rfp did not interrupt the regulation of the chromosomal gfp.
A sequence of the pR' site is related to high Stx production. We thus used this reported pR' site as our reference to investigate our candidate pR' sites. The reference pR' site (accession number: AP000400) [41], which is related to high Stx production [27,40], was not found in the candidate prophages from E. coli FUA 1306, E. coli FUA 1309, and E. coli FUA 1399; and the constructs that do not have the pR' site as the reference did not express DsRed after induction. Additionally, it seems that different types of pR' sites randomly combine with different stx genotypes: Pp1399-28::rfp::chl r has the same stx2a as Pp1399-79::rfp::chl r , but different pR' sites. Another finding is that the induction level of Pp1303-1::rfp::chl r , which harbors the same pR' site as the reference sequence, did not increase significantly. Typically, strains with the reference pR' site have a higher expression level; this phenotype might relate to the change of the binding ability of RNA polymerase to the prophage DNA and Q [56], and thus affect phage metabolism and physical behavior during lysis.
The presence of two more stx prophages was proposed to increase the pathogenicity of the STEC by changing the toxin expression [57]. However, other research has reported that lysogens with more than one phage produce less toxin [58]. In this study, E. coli FUA1399, prophages 1399-28 and 1399-79 carry the same stx2a, which is related to a high rate of HUS [59]. While Pp1399-79::rfp::chl r was highly induced, Pp1399-28::rfp::chl r was not induced. This indicates that expression of the Shiga toxin in a STEC is not determined by the number of Stx prophages, but by the expression levels that are controlled by the interaction of the regulatory Q protein(s) and the pR' site.
Genetic exchange through phages generates genomic diversity and promotes the evolution of the host bacteria. Such gene transfer helps bacteria survive in the diverse environments in nature, but also gives the chance for bacteria to gain virulence determinants from pathogenic strains, thus generating new pathogens [3,7,45,60,61]. As a food-borne pathogen, E. coli gaining stx during evolution has a substantial impact on human health. Beef cattle are a main source of STEC transmission to humans, either directly through the meat supply or indirectly through contamination of water and plant foods [62,63]. Predatory protists are proposed to exert a selective pressure for maintenance of the Shiga-toxin prophage by commensal E. coli in ruminants [7]. It is tempting to speculate that the sequence diversity of Shiga-toxin prophages responds to the diversity of predatory protozoa in the gut microbiome of ruminants [64]. Understanding the link between genomic diversity of Stx prophages and Stx production may provide solutions to predict and prevent STEC contamination in ruminants and human STEC infections.

Conclusions
In this study, the phylogenetic relationship of the stx confirmed previous investigations that the sequence structure of stx is highly conserved. However, the phylogenetic analysis of the pR' region revealed that this late promoter region was more heterogeneous. The combination of the fluorescent reporter fusion system and flow cytometric analysis confirmed that toxin expression could be observed at the single-cell level. Our data from the phylogenetic analysis and the determination of toxin expression levels of the pR'::rfp::chl r transformants indicated a correlation between the diversity of the late promoter pR' region and the efficiency of toxin expression. These results may provide evidence that in addition to the diversity of the functional genes, the diversity of the late promoter region, pR' region also contributes to the level of toxin expression.