Genomic Analysis of Shiga Toxin-Producing E. coli O157 Cattle and Clinical Isolates from Alberta, Canada

Shiga toxin (stx) is the principal virulence factor of the foodborne pathogen, Shiga toxin-producing Escherichia coli (STEC) O157:H7 and is associated with various lambdoid bacterio (phages). A comparative genomic analysis was performed on STEC O157 isolates from cattle (n = 125) and clinical (n = 127) samples to characterize virulence genes, stx-phage insertion sites and antimicrobial resistance genes that may segregate strains circulating in the same geographic region. In silico analyses revealed that O157 isolates harboured the toxin subtypes stx1a and stx2a. Most cattle (76.0%) and clinical (76.4%) isolates carried the virulence gene combination of stx1, stx2, eae and hlyA. Characterization of stx1 and stx2-carrying phages in assembled contigs revealed that they were associated with mlrA and wrbA insertion sites, respectively. In cattle isolates, mlrA and wrbA insertion sites were occupied more often (77% and 79% isolates respectively) than in clinical isolates (38% and 1.6% isolates, respectively). Profiling of antimicrobial resistance genes (ARGs) in the assembled contigs revealed that 8.8% of cattle (11/125) and 8.7% of clinical (11/127) isolates harboured ARGs. Eight antimicrobial resistance genes cassettes (ARCs) were identified in 14 isolates (cattle, n = 8 and clinical, n = 6) with streptomycin (aadA1, aadA2, ant(3’’)-Ia and aph(3’’)-Ib) being the most prevalent gene in ARCs. The profound disparity between the cattle and clinical strains in occupancy of the wrbA locus suggests that this trait may serve to differentiate cattle from human clinical STEC O157:H7. These findings are important for stx screening and stx-phage insertion site genotyping as well as monitoring ARGs in isolates from cattle and clinical samples.


Introduction
Shiga toxin-producing Escherichia coli (STEC), especially O157:H7, is an important food and waterborne pathogen. Cattle are considered asymptomatic carriers of STEC O157:H7 STEC O157 strains resistant to streptomycin, sulfisoxazole, and tetracyclines are commonly associated with isolates from commercial feedlots [35] and clinical [36,37] samples in Canada and abroad [38,39]. Mobile DNA elements such as transposons and plasmids are vehicles and major distributors of ARGs through horizontal gene transfer within or across bacterial species.
Whole genome sequencing (WGS) is increasingly being used by the Centers for Disease Control and Prevention, the Food and Drug Administration, the United States Department of Agriculture's Food Safety and Inspection Service [40] and the Public Health Agency of Canada [41] for surveillance and to discriminate closely related STEC from outbreak events. STEC are prevalent and highly diverse in cattle [42], while strains that cause severe human disease are less diverse and infrequent [43]. Furthermore, there is growing evidence of secondary transmission of STEC O157:H7 [44,45], which suggest that humans can act as carriers. Therefore, to understand STEC O157 strains circulating within the same geographic region in cattle and humans in Alberta, we sequenced STEC O157 for a comparative genomic analysis of clinical isolates collected in hospitals and isolates collected from feedlot cattle from 2007 to 2015.

Discussion
This study conducted a comparative analyses of STEC O157 from cattle and clinical samples in the same geographic region. Our findings revealed evidence of similar and dissimilar virulence profiles in STEC O157 strains based on MLST, stx and stx-phage insertion site genotyping. In addition, mobile DNA elements such as transposons and integrons, which are major contributors to genetic variation and drivers of antimicrobial resistance among bacteria, were identified. In silico serotyping revealed that clinical (n = 127) and cattle (n = 99) isolates with O-and H-antigen determinant genes also have a stx1+, stx2+, eae + and hly + profile confirming their pathogenicity potential as O157:H7. However, sixteen O157:non-H7 cattle isolates H19; (n = 6), H29; (n = 5) and H12; (n =5) were considered non-STEC and may not be pathogenic as they all lacked stx1−, stx2−, eae − and hly − . Based on whole genome (wg) MLST, the most prevalent O157:H7 STEC clonal lineage circulating in cattle and humans belong to the same sequence type, ST11 as the O157:H7 pathogen type [27] and corroborates previous E. coli O157:H7 studies in Alberta which found that ST11 was the predominant clone among cattle isolates [30,31]. Most ST11 isolates in both cattle and clinical samples possessed a similar virulence gene profile (stx1+, stx2+, eae + and hly + ), suggesting that wgMLST is a good predictor of isolates from cattle that may cause clinical disease in humans. Similarly, ST10, a non pathogen based on O157:H7 sequence typing [27] and other STs 515; (n = 4), 763; (n = 6) and 9964; (n = 1) in cattle isolates, had similar non-pathogenic (stx1−, stx2−, eae − and hly − ) profiles. Although wgMLST can distinguish potential pathogenic and non-pathogenic strains and may have a high discriminatory power compared to phenotypic typing methods, the discriminatory power of this method should be interpreted with caution or used in conjunction with other virulence profiling methods as it failed to predict the ST for two clinical isolates (stx1 − /stx2 + ) in this study. These could be new, uncharacterized STs that possess a different genetic repertoire compared to those STs that are reported in the E. coli MLST [27] database.
The stx1/2 including subtypes are classified based on difference in protein sequence and biological activity [47]. Toxicity and cell receptor binding affinity of the different stx types play a major role in clinical outcomes [48] with stx2a and stx2c more potent than stx1, stx1c, sx2d and stx2e in humans [49][50][51][52]. The stx1a/2a subtypes identified in this study have previously been reported in cattle [31] and clinical [29] isolates from Alberta and suggest these are the main subtypes circulating in this region. This indicates a possible risk of E. coli O157:H7 circulating between cattle and humans given that human to human transmission should be less common in high-resource countries due to more robust sanitation practices. A combination of stx2a and eae genes in O157:H7 maybe associated with hemolytic uremic syndrome [29]. The majority of cattle and clinical isolates in this study possessed both stx2a and eae genes which could be indicative of their virulence potential in humans.
The stx1 and stx2 genes are carried by two different prophages which can infect the same bacterial strain. In this study, fragmented assemblies failed to reveal intact stx1/2carrying prophages in O157:H7 genomes. Therefore, two Sakai phages (Sp) Sp5 and Sp15 from E. coli O157:H7 Sakai which carry stx2 and stx1, respectively, were used as reference stx-carrying phages to search for corresponding gene sequences in O157:H7 isolates. The int gene, which catalyses phage integration in Sakai phage had 100% sequence identity to that in O157:H7 from Alberta, illustrating the conservation of this phage within this serotype.
Stx-carrying phages in O157 E. coli are important in the dissemination of stx genes and genetic diversity of the bacterial host. Although most cattle and clinical isolates showed a similar genotypic profile by wgMLST and stx typing, this was not the case for stx1/2-carrying-phage insertion genotyping. There was a substantial disparity in occupied/unoccupied wrbA insertion site between cattle and clinical isolates, which raises the question as to why this site was occupied by phage in cattle, but not in clinical isolates. According to Serra-Moreno et al. [53], insertion site occupancy by stx-carrying phage is host strain and locus specific. Fourteen cattle isolates with stx1−/stx2− profile expectedly lacked both mlrA and wrbA insertion sites, whereas most clinical isolates (n = 95) unexpectedly had unoccupied wrbA. Differential occupation of stx2-phages in clinical isolates compared to cattle isolates suggests that O157:H7 strains circulating in humans and cattle differ, even though cattle are seen as the main reservoir of this pathogen. This may serve as an important epidemiologic trait which could differentiate these strains during outbreak events. This finding is supported by that of Shaikh and Tarr [9] who found that O157:H7 isolates from several sources including hamburgers had unoccupied wrbA locus. However, most isolates in the study of Shaikh and Tarr [9] were stx1−/stx2+, contrary to our findings as most O157:H7 clinical isolates (n = 95) were stx1+/stx2+.
To further understand the disparity in the unoccupied wrbA locus in clinical isolates, we looked for sequence polymorphisms between the unoccupied O157:H7 isolates' wrbA locus and that of E. coli O157:H7 Sakai. A 33 bp sequence deletion or an 18 bp insertion in the wrbA gene were identified, modifications that may be responsible for the lack of occupancy in most clinical as compared to cattle isolates. An altered and unoccupied locus, indicates that selection of secondary insertion site is not only limited to absence of primary site [53], but also possibly to modification of the insertion sequence. Therefore, we hypothesized that the presence of stx2-carrying phages in O157:H7 with an unoccupied and a defective wrbA locus may be an adaptive response by phages to overcome a possible host-associated defense mechanism through integration at a different site in the O157:H7 chromosome. Fragmented assemblies failed to reveal flanking genes (insertion site) of the phage integrase of these isolates with unoccupied and or a defective (33 bp deletion/18 bp insertion) wrbA site. Further, the phage int gene in stx2-carrying phages in isolates with defective wrbA locus was 100% identical to that of Sp15, ruling out the possibility that changes in the sequence of int were responsible for the lack of integration at this site.
Other stx2-carrying phage integration sites in O157:H7 such as argW, sbcB, and yecE have been reported [6,54] and phage may prefer these insertion sites when the primary wrbA site is absent [53]. Except for argW, these integration sites were unoccupied in this study. However, the int at the argW site was identical to integrase of Sp16, a non stx-carrying phage in E. coli O157:H7 strain Sakai. The fact that a 33 bp sequence deletion is limited to most (n = 95) stx2-carrying clinical isolates with the stx1 + /stx2 + profile, suggests they are a common clone that circulates in humans and may be genotypically different from those in cattle (n = 90) with the same stx (stx1 + /stx2 + ) profile. Only 4 cattle isolates with unoccupied defective wrbA had the same stx (stx1 + /stx2 + ) profile as those of clinical isolates, possibly a reflection of a similar lineage that circulates in both cattle and humans.
Interestingly, we also observed that clinical, 30.6% (15/49) and cattle, 4.1% (4/98) isolates with an occupied mlrA loci, lacked stx1 in their genome. A possible explanation for this is the partial loss of the stx1-carrying phage segment. For example, the remnants of integrated phage int gene were still present in the occupied site in agreement with Shaikh and Tarr [9] who reported the absence of stx1 in most O157:H7 strains with an occupied mlrA. Shiga toxin loss is an observation reported in O157:H7 [30,55] and is strain-and stx type-related [56]. However, the absence of stx1 was more prevalent in clinical isolates than cattle and suggest it could be host-associated. It has been shown that the remnants of stx1-carrying phage segments in O157 have the propensity to recombine with other mobile genetic elements or acquire and disseminate the stx1 gene [57]. Consequently, defective stx1-carrying phages in O157:H7 strains in this study may reflect a cycle of loss and gain of these elements.
Other than stx genes, most cattle and clinical isolates with stx1 + /stx2 + or stx1 − /stx2 + profile possessed a similar virulence gene profile which included Type III secretion LEE encoded Esp effector proteins, tir receptor protein, intimin protein, toxin B and associated adherence protein factors common to STEC O157 [58,59]. These factors, especially eae, detected in all stx1 + /stx2 + or stx1 − /stx2 + are important for STEC adherence to intestinal epithelia cells during infection in humans.
The number of isolates with antimicrobial resistance genes (ARGs) were equally distributed (11 each) in cattle and clinical samples with different resistance/multidrug profiles, possible reflective of differences in antimicrobial use in humans vs. cattle. Although ARGs were detected in low numbers in cattle (8.8%; 11/125) and clinical (8.7%; 11/127) isolates, most genes were associated with antimicrobials of clinical importance, including category II antimicrobials such as aminoglycosides and penicillins [60].This category was also more prevalent in clinical isolates with stx1 + /stx2 + or stx1 − /stx2 + profiles than cattle. Additionally, ARGs were flanked by transposase and integron-intergrase genes, which are mobile elements that can facilitate the dissemination and prevalence of ARGs horizontally and vertically within O157:H7 or other strains. The integron-integrase detected in this study was a class 1 integron common to the Enterobacteriaceae [61,62]. The streptomycin resistance gene (aadA1, aadA2, ant(3")-Ia and aph(3")-Ib) was the most abundant within the different antimicrobial resistance genes cassettes (ARCs) a finding that aligns with previous studies within Alberta [31,63].
Two ARCs that carried beta lactam gene with profiles, aph(3")-Ib/ blaTEM-1C/ sul2 and blaTEM-1B/ tet(R)/tet(A) were associated with transposons which are a common category of mobile DNA element in E. coli and Salmonella spp [64]. ARC4, detected in a single clinical and four cattle isolates with the ant(3")-Ia (streptomycin) and sul1 (sulfisoxazole) is indicative of an ARC of both cattle and human origin. ARC9, a non-integrated ARC aph(3")-Ib, aph(6)-Id and sul2 in three clinical isolates may have the propensity to become integrated with an integron or transposon which can facilitate the dissemination of ARGs. Like prophages, fragmented assemblies failed to be annotated if transposons and integrons were carried on plasmids, as plasmid replicons were <1 kb. However, the majority of plasmid replicons detected in this study, were of the Inc plasmid type, especially IncF which are common plasmid replicons described in E. coli from animal and human sources [65]. Amongst the IncF plasmids, IncFII (clinical, n = 126 and cattle, n = 116), and IncFIB (AP001918) (clinical, n = 134 and cattle, n = 111) were equally distributed between clinical and cattle isolates, although some were unique to clinical (IncFII(pCoo), n = 1) and cattle (IncFIA(HI1) (n = 1), IncFIB(K) (n = 1) and IncFIC(FII) (n = 6)) isolates. Similar/different types of plasmid replicon in the bacterial strains of different origin may be useful in plasmid characterization and spread of ARGs, since certain plasmid types such as IncF, IncI, IncA/C and IncL are mostly associated with ARGs in Enterobacteriaceae [65].

Conclusions
This study analysed O157 STEC from cattle and clinical sources in the same geographic region using in silico typing methods. MLST and stx typing revealed that most O157:H7 isolates from cattle and clinical samples had a similar ST11 lineage and stx1+/stx2+ profile. Furthermore, a common profile for additional virulence genes (eae, hylA, tir, espA, espB, espP, espJ, katP, toxB, nleABC and chuA,) was observed between cattle and clinical isolates. Beta lactam resistance genes were only detected in four clinical isolates. The majority of clinical isolates with stx2+ profile, had a defective (nucleotide sequence deletion or insertion) wrbA loci which was unoccupied by stx2-carrying phage. There is considerable interest in understanding why there was a huge disparity in stx2-carrying phage occupancy in O157:H7 STEC from clinical and cattle samples in the same geographic region, which may highlight the knowledge gap between stx2-phage occupancy and dissemination of stx2. We intend to explore long read sequencing in combination with short reads to generate hybrid assemblies to further define unique features that have the potential to differentiate cattle from clinical O157:H7 isolates. If this mosaic in stx2-carrying phage occupancy between the clinical and cattle isolates is a stable trait, it could be employed for food safety screening and monitoring of STEC O157:H7 in both farm and clinical environments.

Sample Collection and Bacterial Isolation
Cattle faecal samples were collected from 2007 to 2015 from commercial feedlot pens or from the floors of transport trucks. At feedlots, E. coli were isolated from rectal grab samples, hide swabs and pooled fecal pats from the pen floor (Table S1). Cattle isolates were collected from nine locations in southern Alberta, Canada.
For cattle isolates, presumptive O157 E. coli [66,67] were retrieved from storage at −80 • C, thawed and 1 mL incubated in 4 mL EC broth at 37 • C for 24 h. A 1 mL aliquot of the enriched culture was then centrifuged at 8000× g for 10 min before extraction of DNA from the pellet using the NucleoSpin Tissue Kit (Macherey-Nagel, Islington, ON, Canada). Extracted genomic DNA was used to confirm O157 serogroup and the presence of stx1, stx2 and eae using primers and PCR as described by Conrad et al. [68].
Human clinical O157:H7 isolates (322) were isolated from 2007 to 2015 from stools cultures from patients experiencing gastroenteritis from Alberta and O157 was confirmed by antiserum agglutination [69][70][71]. A random subset (n = 127) were selected and sequenced for the same sampling years as the cattle isolates. DNA was extracted using the MagaZorb DNA mini-prep kit (Promega Corporation, Madison, WI, USA). The University of Calgary Conjoint Health Research Ethics Board approved the study, REB19-0510.

Bacterial Whole Genome Sequencing and Data Analysis
DNA was extracted from 252 bacterial isolates collected from cattle (n = 125) and humans (n = 127) and were subjected to whole genome sequencing on an Illumina NovaSeq 6000 at Génome Québec (Montreal, QC, Canada) to generate 250 bp paired-end reads. Sequence data were downloaded as Fastq files and assessed for quality using the FASTQC tool. Trimmomatic v0.36.5 was used to remove adapter sequences as well as low quality sequences based on a Phred Q score of <30 (99.9%). Reads that met quality standards were de novo assembled into contigs using the Unicycler pipeline [72] and annotated using Prokka [73]. Using an average genome size of 5.5 Mb for E. coli O157, the average sequencing coverage was estimated at 158x. Assemblies of all 252 isolates generated, on average, 276 contigs per genome with an average of 117 contigs being ≥1kb size, and N50 contig length of 154,553 bases (Table S2). The average genome size of the sequenced isolates was 5,304,568 bp (5.3 Mb). The assembled contigs were used for in silico analysis as described. The draft whole genome sequence assemblies of the 252 bacterial isolates have been deposited in GenBank under BioProject ID PRJNA870153.

In Silico Serotyping and Multi-Locus Sequence Typing (MLST) Analysis
The ECTyper tool for Escherichia coli serotyping, version 1.0. [74] was used to confirm the serotype of isolates as O157 using default parameters: O antigen minimum ≥ 90% identity/coverage and H antigen minimum ≥ 90% identity and ≥50% coverage. Genetic relatedness of cattle and clinical O157 STEC strains were determined using an in silico E. coli MLST scheme. Seven housekeeping genes loci (adk, fumC, gyrB, icd, mdh, purA, and recA), previously described for E. coli [27] were used in MLST. The E. coli MLST database was used to assign a number to each locus and a sequence type (ST) for each unique combination of loci.

Identification of Stx Genes, Prophages, and Stx-Carrying Phage and Associated-Insertion Sites
Chromosomal sequences of O157 isolates were verified for stx genes and subtypes by searching genomes using the stx1 and stx2 primers [68], followed by subtype (stx1a-stx1d and stx2a-stx2f) primers [75] using Geneious 10.2.6. FASTA files of O157 genomes were queried for intact, questionable, and incomplete prophages harbored within the strains using Phage Search Tool Enhanced Release [76]. Hits with a query > 70% were considered as incomplete, 70 to 90% as questionable and >90 as intact prophages [76]. The stx-carrying contigs and flanking genes were manually screened for prophage carrying stx with Geneious 10.2.6 using E. coli O157:H7 Sakai phage (Sp5 and Sp15) as a reference. The sequences for wrbA, mlrA, argW, sbcB, and yecE from E. coli O157:H7 strain Sakai were manually curated against assembled contigs of O157 isolates to identify stx insertion sites with possible occupancy based on the presence of the stx-phage integrase (int) using Geneious. To understand the variability between the presence of stx-associated phage and unoccupied wrbA or mlrA insertion sites, we aligned the sequence of the wrbA and mlrA region in cattle and clinical isolates against wrbA and mlrA DNA sequence from O157 strain Sakai using a pairwise sequence alignment tool, in the EMBL-EBI search and sequence analysis tools [77].