Clonal Lineages and Virulence Factors of Carbapenem Resistant E. coli in Alameda County, California, 2017–2019

The prevalence of carbapenem-resistant Enterobacterales (CRE) has been increasing since the year 2000 and is considered a serious public health threat according to the Centers for Disease Control and Prevention. Limited studies have genotyped Carbapenem-resistant Escherichia coli using whole genome sequencing to characterize the most common lineages and resistance and virulence genes. The aim of this study was to characterize sequence data from carbapenem-resistant E. coli isolates (n = 82) collected longitudinally by the Alameda County Public Health Laboratory (ACPHL) between 2017 and 2019. E. coli genomes were screened for antibiotic resistance genes (ARGs) and extraintestinal pathogenic E. coli virulence factor genes (VFGs). The carbapenem-resistant E. coli lineages were diverse, with 24 distinct sequence types (STs) represented, including clinically important STs: ST131, ST69, ST95, and ST73. All Ambler classes of Carbapenemases were present, with NDM-5 being most the frequently detected. Nearly all isolates (90%) contained genes encoding resistance to third-generation cephalosporins; blaCTX-M genes were most common. The number of virulence genes present within pandemic STs was significantly higher than the number in non-pandemic lineages (p = 0.035). Virulence genes fimA (92%), trat (71%), kpsM (54%), and iutA (46%) were the most prevalent within the isolates. Considering the public health risk associated with CRE, these data enhance our understanding of the diversity of clinically important E. coli that are circulating in Alameda County, California.


Introduction
Infections caused by carbapenem-resistant Enterobacterales (CRE) are a growing public health concern, as carbapenemase-producing organisms have become more prevalent in the human population [1,2]. Carbapenems have exceptional stability against most βlactamases, have reduced vulnerability to β-lactam resistance determinants, and became the drug of choice for gram-negative bacteria resistant to cephalosporins [3]. Resistance to carbapenems among Enterobacterales is mediated by the loss of outer membrane porins, the overexpression of multi-drug efflux pumps, and enzyme-mediated resistance. Enzymemediated resistance is of critical clinical importance because these genes can be horizontally transferred and are associated with other clinically relevant resistance genes [4]. Over the past two decades, carbapenem resistance has increased to such a degree that in 2017, the World Health Organization (WHO) released a global priority list of pathogens and ranked CRE in the highest priority category, critical, representing the greatest threat to human health [5].
E. coli is associated with significant morbidity and mortality in infected individuals, and an increasing number of clinical reports describe extraintestinal and invasive infections due to these organisms [6]. Pathogenic E. coli are separated into pathotypes based upon the diseases they cause, the virulence factors present, and host reservoirs [7]. Extraintestinal pathogenic E. coli (ExPEC) are a broad collection of pathotypes that are facultative pathogens colonizing the extraintestinal environment of hosts. ExPECs are the most common gramnegative bacterial pathogens in humans, accounting for the largest percentage of cases of urinary tract infections (UTI's), sepsis, and bacteremia, with a globally increasing mortality and morbidity [8,9]. ExPEC pathotypes include uropathogenic E. coli (UPEC), neonatal meningitis E. coli (NMEC), sepsis E. coli (SEPEC), and a non-human pathotype avian E. coli (APEC) [10]. ExPECs are derived from a small number of phylogenetic lineages, many of which are considered pandemic lineages. Four sequence types (STs), defined using multi-locus sequence typing, are responsible for nearly half of all the E. coli urinary tract infections and blood stream infections in the world: pandemic lineages ST131, ST69, ST 95, and ST73 [11]. However, recent studies have also found that ST405 and ST10 are rapidly increasing in prevalence and distribution [12]. Prior to the year 2000, ExPECs were mostly susceptible to first line antibiotics [13]. However, as the use of β-lactam antibiotics to treat ExPEC has increased, highly resistant ExPECs (e.g., ESBL-producing ExPECs) are now widespread and important causes of UTIs and bacteremia [14]. Carbapenem resistance in ExPECs gives rise to clinical treatment challenges, as these bacteria are more likely to be resistant to all β-lactam antibiotics. By combining ESBL antibiotic resistance genes (ARGs) such as bla TEM , bla SHV , and bla CTX , and AmpC β-lactamases such bla CMY with carbapenem resistance genes, the antibiotics used to treat these infections becomes limited due to such extensive bacterial resistances [15].
E. coli are often resistant to carbapenems due to the production of carbapenemases, a type of β-lactamase enzyme that enables the bacterium to inactivate most β-lactam drugs, including carbapenems. Carbapenemases are separated into molecular classes of β-lactamases: the A, B, C, and D β-lactamases of the Ambler classification. Gram-negative bacteria, such as ExPECs, can disseminate ARGs between members of Enterobacterales through mobile genetic elements, acting as vehicles for transferring resistance mechanisms [16]. This horizontal spread of ARGs has resulted in more resistant strains of ExPECS. Much of the current literature concerning CRE focuses on Klebsiella pneumoniae due to the pandemic success of KPC-producing K. pneumoniae [17], and the dominance of all classes of carbapenemase genes within the genus [18]. There is limited research that characterizes the distribution of specific ARGs, virulence factors, and the correlation between the two within carbapenem-resistant E. coli. Although ARGs are a key survival mechanism for infectious ExPECs, the high number of virulence factors contained within ExPECs are attributed to the increased pathogenicity of the organisms [19].
In E. coli, the spread of virulence factor genes (VFGs) occurs through horizontal gene transfer [20]. VFGs encode proteins, often found on mobile genetic elements, which contribute to the pathogenicity of ExPECs by enhancing the survival of the bacteria outside the intestines and evading host mechanisms [20]. Although horizontal gene transfer is difficult to detect, we see evidence of its success in ExPECs becoming dominant pathotypes through clonal propagation [21,22]. Within the ExPEC pathotypes there is a large collection of highly conserved VFGs which are commonly divided into subgroups: (1) Adhesins, which enable attachment to host cells, commonly through fimbriae mediated attachment, such as fim, sfa, and pap virulence factors; (2) iron uptake/siderophores, which enable iron sequestration in low iron environments, such as IutA; (3) protectins such as KpsM, which prevents phagocytosis or ompA, which encodes for a porin, an outer membrane protein; (4) toxins, such as hylD, which creates pores in the host cells, causing cell lysis; and (5) invasins, which aid in the crossing of the blood-brain barrier, such as ibeB [10]. The presence of these VFGs is often used as a classification mechanism to identify ExPECs, though there is little consensus as to which VFGs should be used for ExPEC classification [8].
Whole genome sequencing (WGS) has supported a more comprehensive characterization of antibiotic resistant bacteria and their mechanisms. In this study, we characterize and compare sequence data from carbapenem-resistant E. coli isolates collected by the Alameda County Public Health Laboratory (ACPHL) in California between 2017 and 2019. As antibiotic resistance is a threat to public health, the ACPHL performs continued surveillance of these critical organisms through a health officer order mandating the submission of carbapenem resistant E. coli. As a result of this surveillance, this analysis aimed to evaluate the genomic features of clonal species to enhance the understanding of clinically important carbapenem-resistant E. coli circulating in Alameda County, California.

Distribution of Antibiotic Resistance Genes
In total, 71 ARGs representing resistance to nine different classes of antibiotics were present across the CR-E. coli sequenced. The resistance profile for the isolate pool was diverse and comprised of five genes for carbapenemases from Ambler classes: One isolate contained bla KPC, a Class A gene; 20 isolates contained class B bla NDM family genes, and six contained bla OXA family class D genes (see Table 2). Carbapenem resistance gene bla NDM-5 was highly prevalent within ST405, ST167, and ST6870; likewise, bla OXA-181 was most often found within ST410. ST131 contained the broadest diversity of carbapenemase genes. All four of the ST90 isolates originated from the same patient; however, only two of the ST90 isolates contained blaNDM-1 carbapenemase-resistance genes (see Supplementary Table S3).
Resistance genes for third generation cephalosporins, bla CTX-M-15 and bla TEM-1B , were found in most STs (41.6% and 62.5%). Furthermore, aminoglycoside resistance was found in more than half the STs (58.3%) via the streptomycin resistance genes aph(6)-Id_1 and aph(3")-Ib_5. When observing the total number of ARGs by ST, both STs 131 and 405 contained the broadest class range and largest number of ARGs. Among ST131s, the maximum number of ARGs present was 13, with a minimum of 2, an SD of 4.04, and an IQR of 7. Similarly, in ST405 isolates, there was a maximum of 17 ARGs, the highest observed number of ARGs in any isolate, with a minimum of 1, a standard deviation of 4, and an IQR of 7. STs 95, 73, 1193, 144, and 122 contained fewer ARGs than the median ARG count of eight. Of note, ST95 contained one ARG, while ST73 contained two ARGs.

Distribution of Virulence Genes
In total, 560 virulence genes were identified within the 82 E. coli analyzed. Adhesins fimG, fimF, and fimD were present in 100% of the isolates. Other adhesins, notably the curli fiber expression genes csgA, csgC, and csgG, were present in most isolates (98%). Invasins ibeB, nadA, and nadB were also present in most isolates (98%). Siderophore VFGs including entA and fepA were present in 95% of the isolates, with fuyA present in 78% of the isolates. Enterotoxigenic E. coli VFG eaeH was present in 99% of the isolates, with the related eaeX being present in 13%.
Putative ExPEC genes for adhesins (all of the afa family, fimA, papA, and papC, and all of the sfa family), siderophore (iutA), protectins (kpsM and traT), and the toxin hylD were selected for further analysis based on results obtained by Johnson et al. [8]. The total number of VFGs among isolates was a median of 190 VFGs per isolate, with a smaller proportion of genes attributed to ExPEC-related VFGs. As shown in Figure 2A Figure 2B, with the VFGs fimA (92%), trat (71%), kpsM (54%), and iutA (46%) being present in most STs.
A Welch two-sample t-test determined that there was a statistically significant difference in the mean number of VFGs between pandemic and non-pandemic ST lineages (p = 0.035). There was no significant correlation between VFGs and ARGs occurring in the same isolates p > 0.05, with numerous weak negative correlation coefficients, the largest being r = −0.042 for papA related to dfrA12 and r = −0.042 for papA and aaDA2. There was no significant correlation between VFG and isolate sample source sources of urine, blood, skin, stool/rectal swabs, and other bodily fluids such as sputum, peritoneal fluid, and dialysate fluid, or other sources.
There were seven instances where more than one isolate was collected from the same patient. In all seven cases, the ST were identical among isolates originating from each patient, and the was no difference in the putative ExPEC gene profile when isolates from the same patient were compared. However, the total virulence gene profile varied from organism to organism. Although horizontal gene transfer is difficult to detect, we saw evidence of clonal variation in patients with multiple isolate samples [21,22]. In Figure 2B, putative ExPEC virulence genes were selected for analysis, with these representing 20 sub types from seven virulence gene families (afa, fim, hlyD, iutA, kpsM, traT, pap, sfa) within 24 STs and one unknown ST category. ST131 contained multiple afa adhesin genes, while ST95 contained numerous sfa adhesin genes. The gene traT, which is commonly associated with ExPEC pathotypes, was present in 71% of the STs [24][25][26].
A Welch two-sample t-test determined that there was a statistically significant difference in the mean number of VFGs between pandemic and non-pandemic ST lineages (p = 0.035). There was no significant correlation between VFGs and ARGs occurring in the same isolates p > 0.05, with numerous weak negative correlation coefficients, the largest being r = −0.042 for papA related to dfrA12 and r = −0.042 for papA and aaDA2. There was no significant correlation between VFG and isolate sample source sources of urine, blood, skin, stool/rectal swabs, and other bodily fluids such as sputum, peritoneal fluid, and dialysate fluid, or other sources.
There were seven instances where more than one isolate was collected from the same patient. In all seven cases, the ST were identical among isolates originating from each patient, and the was no difference in the putative ExPEC gene profile when isolates from the

Discussion
Our results provide a comprehensive genotypic characterization of carbapenem resistant E. coli that were present in carbapenem-resistant infections in Alameda County, California between 2017 and 2019. Drug-resistant infections in Alameda were caused by a broad set of E. coli phylogroups. This is different than what has been reported previously with extended spectrum β-lactamase (ESBL) infections, which suggests that only the phylogroups B2 and D are of primary importance [10,19,27]. Furthermore, a molecular analysis of our data revealed the presence of several pandemic strains (e.g., ST131, ST69, ST95, and ST73), which have been identified across the globe, as well as ST405 and ST10, which have been identified as rapidly increasing in prevalence and distribution [11,12]. However, our analysis also demonstrated a significant amount of variation in STs, which was unexpected. Although a recent review of ExPECs identified a similarly broad range of global STs contributing to the global burden of disease [28], other studies evaluating global and regional carbapenem-resistant E. coli have limited the STs to a small number [29,30]. Our study demonstrates the rapid expansion of carbapenem resistance among a broader range of STs within Alameda County, California.
The breadth of E. coli STs as producers of all Ambler class carbapenemases is noteworthy and potentially reflects the important role of E. coli in the dissemination of carbapenemases within the region. The literature has identified ST131 as the major cause of serious multidrug-resistant E. coli infections globally [28,31]. Our study found that within ST131, there was large variation in the carbapenemase genes present, including classes A, B and D; class C was not present. The spectrum of carbapenemase genes present in the ST 131 isolates shows their ability to incorporate important ARGs and is likely associated with their capacity to cause disease [4,32]. Among the carbapenemase genes present, the distribution of these clinically important enzymes was limited to 33% of the ST 131 isolates. These results are slightly lower than the results from a similar Northern California study analyzing carbapenemase resistance between 2013 and 2016 but may be attributed to a much smaller sample size (n = 24) [33]. The discrepancy in phenotypic and genotypic carbapenem resistance is potentially explained by alternative genetic mechanisms such as chromosomal mutations, drug efflux pumps, porins [34,35], or the overexpression of the extended spectrum β-lactamase and ampC β-lactamase genes [36]. This discrepancy is highlighted by a group of four isolates, all of which are ST90, which were collected from the same patient and included in this study. Two isolates contained blaNDM-5 resistance genes, while two isolates contained no resistance genes for carbapenemase resistance (see Supplementary Table S3). Further, in-depth analyses of resistance determinants would clarify the full extent of each of these mechanisms in mediating carbapenem resistance.
In total, 71 ARGs that encoded resistance to antibiotics, representing nine different classes, were identified. Seventy-four out of the 82 isolates contained genes for β-lactamases from the families of CTX, CMY, SHV, and TEM. The presence of both carbapenem resistance and third generation cephalosporin resistance in many isolates is indicative of both the historical impact of cephalosporin use [3] and the significance of this threat in the context of the increase in untreatable clinical infections [5,37]. Furthermore, studies have demonstrated that the multi drug transporter gene mdf (A) confers broad spectrum antibiotic resistance [18,22]. Our study found that the mdf (A) gene was present in every isolate, and each isolate also contained numerous other resistance genes. Therefore, the resistance profile of each isolate was extremely broad. The presence of such a broad range of resistance genes among a diverse pool of STs underscores the adaptability of E. coli pathotypes and constitutes a significant threat to human health due to the limitations medical providers would face when searching for treatment options for these infections [28,38].
The most common ESBL genes globally are the bla CTX-M genes, of which CTX-M-15 is one of the predominant allelic variants. The literature indicates a significant increase in the global distribution of and subsequent rise in ESBL infections [32,39]. Our study found that over half of the E. coli contained CTX-M genes. Furthermore, all ST131 isolates contained bla CTX-M , with over half being bla CTX-M-15 , which is in line with recent studies that have demonstrated ST131 are strongly associated with bla CTX-M-15 [28,31,40,41]. The relationship between ST131 and bla CTX-M-15 is noteworthy due to the high burden of disease attributable to ST131 [5,31]. As the global prevalence of ESBLs and ESBL-producing Enterobacterales infections continue to rise, the spread of ESBL genes will likely drive an increase in prescriptions for carbapenems and therefore an increase in carbapenemase genes, threatening the long-term effectiveness of this last-line antibiotic.
This study also identified 19 isolates that contained both bla OXA-1 and bla CTX-M-15 , with a significant correlation among four STs: ST131, ST405, ST345, and ST167. The co-carriage of bla OXA-1 and bla CTX-M-15 has been previously identified in ST131 [42] and ST405 [43]; however, our study also showed co-carriage in ST345 and ST167, which had not been previously identified. Previous studies have demonstrated that the co-carriage of bla OXA-1 and bla CTX-M-15 is mediated through co-location on a plasmid [42,43]. This study did not evaluate plasmid composition, and this represents a limitation of the analysis.
A key feature of ExPEC relates to the high virulence capacity of this organism [44,45]. The efficiency of mobile genetic elements when it comes to transferring VFGs facilitates the dissemination and collection of virulence factors among ExPEC strains [30]. Among the 82 E. coli isolates in our analysis, there were 560 unique VFGs. VFGs allow the organism to survive, adapt, and evade the host environment and defense mechanisms [20]. The number of VFGs within pandemic ST lineages was significantly higher than non-pandemic lineages and is a possible correlation to the increased pathogenicity of the pandemic STs. Putative ExPEC genes such as fimA, kpsM, and iutA were present in most STs, indicating the highly virulent isolates and the breadth of plasticity within these bacteria. These findings compliment recent studies which demonstrated that among isolates identified as ExPEC by putative ExPEC genes, there were many VFGs present which aid the bacteria in causing a broad range of infections [10].
Our study aligns with the work of Johnson et al. and Hung [46,47], who determined that iron uptake gene fyuA has a higher prevalence than iutA and that protectin traT has a higher prevalence than kpsM. The current virulence factor gene bank used as definitive markers for ExPECs is based upon a large dataset originating primarily from phylogroups B2 and D [48][49][50]. Although these two groups are hypothesized as the source of horizontal transfer among E. coli, our study demonstrates that the current breadth of clinically relevant carbapenem-resistant ExPECs has expanded far beyond these two "original" phylogroups. This raises concern for establishing pathogenesis based upon virulence parameters which may only represent a distinct subset of the pathotype. Of the ExPEC pathotypes, certain virulence genes are attributed to specific pathotypes and represent the adaptation to colonize a specific niche. VFGs such as the iron uptake genes fyuA, chuA, and the proteolytic toxin gene vat are associated with UPEC [47,51], while the porin ompA gene and invasin ibeA are associated with NMEC [45,52]. Several of our isolates contained VFGs that are related to separate ExPEC pathotypes, which conflicts with the current literature that bases pathotype identification only on virulence factor analysis. Other studies also identified similar trends in heterogeneity of NMEC virulence factors which conflict with pathotyping techniques [53,54]. Furthermore, each isolate from our study that deviated from common pathotype virulence comparisons were from unique STs. The overlap of ExPEC pathotyperelated VFGs underscores the challenges of using a putative gene method to identify ExPEC pathotypes. One possible confounder of this typing method could be the overlap between commensal intestinal colonization and extraintestinal virulence. Recent studies have identified numerous virulence factors which are present in both commensal and pathogenic E. coli [29,55]. Further investigation of associated mechanisms and the inclusion of other markers such as resistance phenotypes and clinical syndromes is required to increase the predictive power of virulence factors for pathotypes.
Although ExPEC pathotyping by virulence factors lacks definition, there is a strong body of evidence for the use of VFGs in pathotyping enteropathogenic E. coli (EPEC) and enterotoxin E. coli (ETEC) [56,57]. Highly conserved eaeH and eaeX genes, which are used as key identifying markers for the pathotypes, were present in almost all our study isolates. This finding may represent a relationship between EPEC and ExPEC, explained by their transient colonization of the gut and then exposure to sterile sites, as opposed to their commensal origin, which does not carry the VFGs identified [58].
This study did not include all Enterobacterales species, and an evaluation of ARGs, VFGs, and plasmids among different Enterobacterales species would greatly improve our understanding of the spread of antibitoic resistance and the mechanisms involved. Additional analyses of the plasmids would be helpful in understanding the genetic elements driving the spread. The identification of ARGs and VFG locations-either chromosomally, via plasmids or both-is essential for understanding the horizontal gene transfer mechanisms of these organisms. Another limitation is the lack of clinical data that could help us to link isolate epidemiologically and help to pinpoint the potential origin of infection and spatial relationships. Although a data-use agreement is in the process of being completed, this study would have benefited greatly from background information pertaining to the clinical isolates other than the date range, isolate derivation from the inpatient hospital, skilled nursing facilities, or long-term acute care facilities, and source type.
The purpose of this study was to enhance the understanding of clinically important drug-resistant E. coli that are present in Alameda County, California, and it has described the most common types of virulence and resistance genes. Moreover, this study has demonstrated a high prevalence of CTX-M-mediated resistance. The presence of CTX-M genes will inevitably lead to increased carbapenem use in humans, with an associated rise in resistance. Understanding temporal trends in these genotypes will help in the development of hypotheses as to why these changes occur, supporting strategies for reducing the spread of AMR in Gram-negative bacteria, averting excess mortality, and preserving existing classes of antibiotics for future generations.

Conclusions
Carbapenem resistance is a rising concern for public health departments. Our study has characterized the genomic features of E. coli isolates responsible for carbapenem resistant infections in Alameda County from 2017 to 2019. In contrast to previous findings, our data suggest that a broad range of phylogroups and sequence types are responsible for these infections, rather than only the traditional high risk clonal groups (e.g., ST131, ST69, and ST10). A broad repertoire of carbapenemase genes from each Ambler class were in circulation in the Alameda County health facilities, in addition to ESBL genes including bla CTX-M-15 . Over 500 known virulence genes were detected within the 82 isolates characterized, and at least one ExPEC-associated virulence gene was detected in all isolates. There was a higher prevalence in terms of virulence genes in pandemic ST lineages. The heterogeneity of clonal group-associated resistance genes and virulence genes detected in our analysis suggests the high adaptability and diversity of E. coli pathotypes in the community. This will present an ongoing challenge for public health mitigation and will demand broad genomic surveillance.

Materials and Methods
A total of 82 carbapenem-resistant E. coli isolates were received by the Alameda County Public Health Laboratory (ACPHL) between June 2017 and July 2019. All isolates were obtained from Clinical Laboratory Improvement Amendments (CLIA) certified clinical laboratories within the county of Alameda and originated from individual patients. Seven patients had multiple isolates due to special circumstances, such as facility transfer or variation in the isolate susceptibility profiles (see Supplementary Table S2). Isolates were derived from inpatient hospitals, skilled nursing facilities, or acute care facilities.

Susceptibility Testing
All of the healthcare facility labs that supplied isolates to ACPHL were CLIA certified. Most labs performed susceptibility testing using the minimum inhibitory concentration (MIC) method; however, a small number of facilities use the Kirby-Bauer disk diffusion method. All isolates were determined to be carbapenem-resistant according to the CLSI M100 guidelines, and CLSI breakpoints for ertapenem, imipenem, and meropenem were used. E. coli ATCC 25922 was used as a control strain.

Whole Genome Sequencing
Purified DNA was extracted from each E. coli isolate for whole-genome sequencing and subsequent genomic analyses. DNA extraction was performed by ACPHL using a Roche MagNa Pure Compat Instrument (Roche, Basel, Switzerland), in accordance with the manufacturer's protocols. Purified DNA was quantified by fluorimetry using Qubit 3.0 (Invitrogen, Carlsbad, CA, USA). ACPHL performed library preparation for whole-genome sequencing by starting with the quantified DNA diluted to 0.2 ng/µL which was then fragmented and tagged using a Nextera XT Library Preparation Kit (Illumina, San Diego, CA, USA). Indexed Libraries were purified and quantified for quality using a Qubit 3.0 fluorometer in combination with their Qubit negative and Qubit positive broad range standards. Samples were pooled in equimolar quantities and subjected to DNA sequencing using single-end, 150-cycle reactions in a MiSeq (Illumina, San Diego, CA, USA) at ACPHL.
The Illumina dual-indexed, single-read sequences were assembled using Unicycler, a SPAdes assembly pipeline with polishing steps using Pilon, Bowtie2, and Samtools [59]. De-novo assembly was performed for all bacterial isolate genomes using sequence reads with a minimum length of 500 bp.
Each sequencing run procedure, starting from DNA extraction through sequencing and sequencing analysis, included the isolate BAA-2146 (ATCC) as a control. BAA-2146 is a clone of Klebsiella pneumoniae which has been completely sequenced and its plasmid content fully characterized. Library prep and the sequencing process included a final FASTA file of BAA-2146, which was annotated for specific genes distributed throughout the chromosome and plasmids as a means of quality control.

Genotypic Analysis
The ACPHL performed species identification on GAMBIT software. The GAMBIT reference database contains over 50,000 genome sequences representing 1445 bacterial species compiled and curated from the National Center for Biotechnology Information's Reference Sequence Collection, which includes over 4000 genome sequences from E. coli.
Phylogroup analysis of the E. coli genomes was performed using in silico Clermon-Typer [60,61]. Multi-locus sequence typing was performed on all sequences using seven housekeeping genes, adk, fumC, gyrB, icd, mdh, purA, and recA, by cross referencing the isolate sequence types with the Center for Genomic Epidemiology MLST 2.0 database that makes use of MLST allele sequences and profile data from PubMLST.org [62]. E. coli sub typing using fimH single nucleotide polymorphism (SNP) analysis was also performed on all E. coli sequences using the FimTyper-1.0 database from the Center for Genomic Epidemiology [63] with a threshold ID of 95%.
The sequence data for the 82 E. coli genomes were screened to identify antibiotic resistance genes (ARGs) and VFGs using the software ABRicate [64], a mass screening tool for identifying contigs. ABRicate software integrates the ResFinder 4.0 database [65] and Virulence Finder database (VFDB) [66]. ResFinder search parameters were set to the default: 90% sequence identity with a minimum sequence overlap length of 60%. VFDB BLAST search parameters were set to default using the nucleotide sequences from the VFDB core dataset (setA) database and the blastn program: low complexity filter, Expect 0.01, and Matrix BLOSUM62.
For phylogenetic analyses, Snippy v3.2 [67] was used to map reads to the E. coli MG1655 reference genome (NCBI:txid511145). The SNP distance between isolates was calculated using snp-dists v0.8.2 [68]. FastTree v2.0 was used to generate an approximate maximum likelihood phylogenetic tree using the general time reversible model. The tree was visualized in iTol v6 [69].

Statistical Analysis
Chi square tests for independence, with a significance level of alpha = 0.05, were used to assess the independence between genes. A Welch two sample t-test was used to evaluate the difference between the means of virulence factor genes between pandemic and non-pandemic lineages using a significance level of alpha = 0.05. To evaluate if there was any correlation between the presence of ARGs and VFGs, a Pearson correlation coefficient was calculated to evaluate the correlation between the presence of VFGs and ARGs among isolates. A Pearson correlation coefficient was calculated to evaluate the correlation between ST and Isolate specimen source as well as VFG and Isolate specimen source. To show the amount of genetic material within the genomes made up by specific genes in relation to the whole genome, the relative abundance was calculated. The relative abundance was calculated by taking the total length of the ARG sequence and dividing it by the total genome length for each isolate. The ST relative abundance was calculated by taking the mean of the relative abundance for all the isolates within the ST group. All statistical analyses were performed using R Studio software version 1.4.1103 with the following R packages: ggplot for visualization [70] and dplyr [71], stringr [72], tidyr [73], knitr [74], kableExtra [75], and fastDummies [76] for data manipulation and reporting.