Tackling Drug Resistant Infection Outbreaks of Global Pandemic Escherichia coli ST131 Using Evolutionary and Epidemiological Genomics

High-throughput molecular screening is required to investigate the origin and diffusion of antimicrobial resistance in pathogen outbreaks. The most frequent cause of human infection is Escherichia coli, which is dominated by sequence type 131 (ST131)—a set of rapidly radiating pandemic clones. The highly infectious clades of ST131 originated firstly by a mutation enhancing conjugation and adhesion. Secondly, single-nucleotide polymorphisms occurred enabling fluoroquinolone-resistance, which is near-fixed in all ST131. Thirdly, broader resistance through beta-lactamases has been gained and lost frequently, symptomatic of conflicting environmental selective effects. This flexible approach to gene exchange is worrying and supports the proposition that ST131 will develop an even wider range of plasmid and chromosomal elements promoting antimicrobial resistance. To stop ST131, deep genome sequencing is required to understand the origin, evolution and spread of antimicrobial resistance genes. Phylogenetic methods that decipher past events can predict future patterns of virulence and transmission based on genetic signatures of adaptation and gene exchange. Both the effect of partial antimicrobial exposure and cell dormancy caused by variation in gene expression may accelerate the development of resistance. High-throughput sequencing can decode measurable evolution of cell populations within patients associated with systems-wide changes in gene expression during treatments. A multi-faceted approach can enhance assessment of antimicrobial resistance in E. coli ST131 by examining transmission dynamics between hosts to achieve a goal of pre-empting resistance before it emerges by optimising antimicrobial treatment protocols.

Approximately half of hospital-acquired (HA) and 70%-95% of community-acquired (CA) UTIs are caused by E. coli, and these are suspected to originate from intestinal colonisers [37,38]. There are multiple potential sources of E. coli UTIs and BSIs because it contaminates food [39][40][41] such as fruit [42] and meat [43], and infects companion animals [44] like cats and dogs [45]. E. coli strains are categorised in terms of their anatomical niches into three main groups [46]: the first are commensals, which typically belong to phylogenetic groups A or B1, and are not normally pathogenic when outside the intestinal tract because they lack virulence factors. The second are intestinal E. coli that are generally absent in healthy hosts, are virulent and come from a variety of phylogenetic groups. The third affect extraintestinal areas (e.g., UTIs): these are genetically distinct from intestinal and commensal E. coli, and ordinarily come from phylogenetic groups B2 or D. They may exist as asymptomatic gut commensals (see for example [47]), but can efficiently colonise regions outside the intestine.
Bacterial AMR is a major threat to public health [69]: AMR in E. coli is evident worldwide [4] and encompasses many compounds [24,70]. AMR in E. coli is becoming more frequent [71] and is driven by environmental exposure [72], such as effluent waste water [73,74] and soil treated with manure from antibiotic-treated livestock [75]. Antibiotic treatment failure jumped by ~12% in the UK from 1991 to 2012, and was more frequent against certain second-line antimicrobials (such as quinolones and cephalosporins) than in first-line ones (penicillins, macrolides, flucloxacillins) [76]. 75% of E. coli from 1997 to 2007 in Irish hospitals displayed AMR to eight or more of 16 antimicrobials [77]. ESBL-producing E. coli are resistant to cephalosporins and fluoroquinolones, leaving carbapenems as the sole last-resort antimicrobial [4]. However, multi-AMR ESBL-producing ST131 with carbapenem-resistance are now frequent [78] and in Algeria [79], China [80] and Ireland [81].
ST131 represents a universal problem whose evolutionary epidemiology needs deeper study. Not enough is known about how it evolves and spreads at a local scale [82,83]. As a fraction of total E. coli infections, ST131 infection rates are proportionally higher in long-term care facilities (LTCFs) (76%) than hospitals (49%) and the public (15%) [84]. This was originally driven by the acquisition of fluoroquinolone-resistance sometime prior to 2000, and has subsequently been amplified by the gain of CTX-M (cefotaximase) elements, which are beta-lactamases (bla) that hydrolyse beta-lactam rings [85]. UTIs caused by blaCTX-M-positive ST131 are now widespread in LTCFs [86], hospitals [87] and the community [88][89][90], to the point where all ESBL-producing E. coli in certain LTCFs are ST131 [91]. Notably, in the latter study 93% of these belonged to just a single PFGE pulsotype. Current hypotheses on the sources of E. coli have identified LTCFs as a closed microenvironment in which AMR bacteria evolve [92], and subsequently they diffuse into the wider community, spreading AMR alleles [93].

The Genomic Landscape of Antimicrobial Resistance in E. coli
Commonly E. coli genomes are separated into the largely chromosomal highly-conserved core genome and the accessory genomes. Notably, the core genome varies between studies [94] because it is composed of genes present in all samples for that analysis [95]. The accessory genome typically has much lower sequence conservation and encodes non-essential traits associated with virulence and antimicrobial metabolism [96]. It represents a major component of microbial variation and includes MGEs like plasmids, transposable elements (TEs), pathogenicity islands, and prophages.
E. coli undergo extensive horizontal gene transfer (HGT) [97]: HGT accounts for ~31% of genome-wide variation in ST131 [98]. Genes arising by HGT are frequently associated with duplications, which can permit new functions among the two gene copies [99]. HGT typically occurs through three processes: the primary one is transduction via bacteriophages after cell lysis and invasion via homologous recombination. ST131 acquire AMR through extrachromosomal MGEs from bacteriophages [100], or the integration of prophage DNA into chromosomes [101]. Prophage elements account for over 35% of E. coli CDS variation [102], and encode genes linked to virulence [103], growth during nutrient deficit [104], transcriptional regulation [105], AMR [106] and biofilms [107]. HGT segment length is a function of phage type during generalised transduction. Other options for HGT are the transformation of local DNA from dead cells, or conjugation mediated by plasmids, TEs, integrons or integrating conjugative elements (ICEs). Homologous recombination between ICEs increase their diversity [108]. TEs such as ISs have a high frequency of integration and excision, and mediate both AMR gene composition and their expression rates. Although HGT is generally a function of sequence homology, TEs do not need high levels to insert.
More detailed knowledge of AMR mutations and gene expression rates can help optimise antimicrobial treatments [109,110]. A large number of ST131 genomes have been published. The first was NA114 from an Indian UTI, encompassing a 4936 Kb genome of which 88.4% was coding sequence (CDS) [111]. In comparison, genome JJ1886 from a UTI was 5,130 Kb and genetically more representative of the super-spreading subgroups within ST131 [112]. Notably, JJ1886 contained a chromosomal CTX-M-15 element due to a partial insertion of a Tn3 element into a lambda-like prophage locus. The third was isolate EC958 from the UK was also blaCTX-M-15-positive and provided a 5110 Kb genome somewhat distinct from NA114 and JJ1886 [113], perhaps due to its earlier isolation date. The fourth was carbapenem-resistant H30-Rx sample MNCRE44 from the USA in 2012: it was assembled using long-read sequencing technology as a 5011 Kb genome with six plasmids, two of which encoded multiple beta-lactamase and other AMR genes [114]. It is important to note that the number of genes, tRNAs and rRNAs in NA114 (4875; 67; 3), JJ1886 (5086; 88; 22), EC958 (4982; 89; 7) and MNCRE44 (5407; 86; 22) likely differ due to differences in sequence library preparation and genome assembly: not all genomes are created equal.
The remaining ST131 belong to a group named H30 (or Clade C [117]) defined by their acquisition of a fimH30 allele: the earliest date of appearance of H30 is at least 1997 [118]. Fluoroquinolone-sensitive ST131 was common prior to 2000, but is now rare. This is due mutations in two genes: one encodes DNA gyrase subunit A (gyrA) with mutation D87N (from 2000) on a background of S83L (from 1997) [118]. DNA gyrases are the primary targets of quinolones. Notably, D87N-S83L (known as 1AB) is fluoroquinolone-resistant whereas S83L alone (1A) is sensitive. The other gene encodes DNA topoisomerase 4 subunit A (parC) and mutated sometime before 2000 to fluoroquinoloneresistant allele 1aAB (S80I and E84V). Nested within H30 is H30-R, which accounts for 58% of ST131 and is becoming more prevalent [117]. H30-R is nearly always fluoroquinolone-resistant-but this can be lost [98].
A subgroup of H30-R named H30-Rx is a recent radiation dating to at least 2002 and is nearly always blaCTX-M-positive [98,[117][118][119][120]. Just three genome-wide SNPs distinguish H30-Rx from other H30-R. It is the main driver of increasing rates of blaCTX-M-15, which is the most frequent H30-Rx CTX-M element type and originated from Kluyvera in the early 2000s [121], such that 49% of E. coli  [98]. Such isolates could be essential for determining the origins of H30-Rx. BlaCTX-M-15 is also found less commonly in ST38, ST405 and ST648 [120].

The Key Antimicrobial Resistance Elements in ST131
Numerous core genome mutations are implicated in ST131's AMR [123]. These include regulators of drug efflux, repressor of the marRAB (multiple antibiotic resistance) operon (marR), and repressor of acrAB (achromobactin outer membrane receptor) genes (acrR), that respond to drugs associated with high fitness costs-for example AcrAB helps non-aminoglycoside drug efflux from the cell. Others are L42R in ribosomal gene S12 (rpsL105); a transposon mediating resistance to tetracycline (Tn10); parC1aAB to fluoroquinolones; ciprofloxacin and nalidixate thanks to changes at gyrA (most frequently allele 1AB) [124]. Chromosomal ampC-like cephalosporinases have been found in E. coli but are rare [49].
Plasmids contain genes encoding various beta-lactamases that are grouped into Ambler classes A (blaCTX-M, blaTEM, blaKPC, blaSHV); B (blaVIM, blaNDM, blaIMP); C (blaCMY, ampC-related like blaDHA); D (blaOXA). Classes A/C/D use serine for beta-lactam hydrolysis, whereas B are metallo-beta-lactamases that use divalent zinc ions [130]. BlaCTX-M in class A are the most frequent ESBL in E. coli, followed next by TEM beta-lactamases. The latter are exchanged extensively between species: for example, TEM-52C in E. coli is mediated by an IncI1-g plasmid on a Tn3 transposon [131] that been found in Salmonella. SHV beta-lactamases are less common in E. coli, but are abundant in K. pneumoniae, from which ST131 has acquired other beta-lactamases [132].

How Did ST131 Adapt to Be so Successful?
Bacterial infectious disease is driven in part by rapid mutation rates (~10 -5 per genome per generation in E. coli) [143], short generation times [144], and non-lethal antimicrobial doses due to non-adherence [145]. Remarkably, AMR may have no fitness cost and may even increase fitness in E. coli [146,147]. ST131 is highly resistant without any fitness cost [128], even more so in CA compared to HA ST131 isolates [148]. As outlined, the three key changes define H30-Rx ST131: the acquisition of fimH30 [117], fluoroquinolone-resistance mutations and beta-lactamase element gains [98,117]. In contrast to the single acquisitions of fimH30 and fluoroquinolone-resistance, the type of beta-lactamase varies, and they are lost and recovered frequently. Consequently, antimicrobial treatments have inadvertently created the H30-Rx superbug, and associated fitness costs may be linked with beta-lactamases, but less so for fimH30, gyrA1AB and parC1aAB. As acknowledged above, there are rare cases of fimH30 loss and mutation of gyrA/parC within H30-Rx.

Extensive Recombination and Horizontal Gene Transfer
Extraintestinal E. coli have higher rates of homologous recombination than commensals: this is positively correlated with the number of virulence factors and may assist with replication of segments arising by HGT [149]. ST131 is restricted largely to UTIs and BSIs [150], so this extensive recombination may have originated to cope with these local environments [149]. Yet in spite of this pervasive recombination and HGT, DNA exchange with other bacterial species is rare because of this restriction to UTIs and BSIs, affecting just ~0.4% of the core genome conserved across the E. coli genus [151]. UTI-causing ESBL-positive ST131 acquire resistance through HGT events with other ST131 colonising the same individual, but rarely from other E. coli phylogroups [55]. ST131 in poultry retains a distinct set of blaCTX-M and blaTEM-52 elements despite high overall genetic similarity [152,153]. Naively, this suggests previous HGT with Kluyvera [119] and K. pneumoniae [132] is rare. However, examples of plasmids derived from K. pneumoniae in ST131 [154] is consistent with a proposal of alarming potential genetic flexibility in ST131.

Maintenance of a Broad Resistome
ST131 regulates the activity of its resistome: this is its entire complement of its AMR genes. There is evidence of functional robustness and redundancy in the resistance mechanisms of E. coli's "proto-resistome" comprising penicillin-binding proteins, cell wall modifying enzymes and cell division genes [155]. Expression of 61 genes is associated with small increases in tolerance to 86 antimicrobial-related compounds [156]. Many potential AMR genes remain unknown in E. coli [157], even though their activities vary even within putatively clonal cell populations [158] and the mechanism of action of many antimicrobials remains unclear [159]. A sequencing approach could assess this in a comprehensive manner.

Regulatory Fine-Tuning of Gene Expression
Regulatory sequences alter AMR through gene expression levels [160] including bursting [161] and transitions between pathogenic and non-pathogenic states [127]. Promoters evolved to alter gene expression rates in response to rapidly changing environmental conditions [162], which can affect AMR phenotypes by up to 10 6 -fold [163]. Promoter gene expression regulation is higher at non-essential genes with lower sequence conservation, and their activity rates are more variable than those of essential genes [164]. During multi-drug exposure, 30% of E. coli expression variation is attributed to promoter mutations [165]-for example, a mutation T32A at a beta-lactamase ampC gene promoter elevates expression rates genome-wide [166].

Cross-Antimicrobial Resistance and Compensatory Mutations
ST131's success may be enhanced by synergy between different antimicrobials [167]-however, antagonism is frequent, even to the point that a Harvey Effect of simultaneous synergy and antagonism is possible. Some compensatory mutations are specific to individual antimicrobials but others promote cross-AMR simultaneously [168]. Feedback-based cross-AMR in response to combination treatments is a function of the number of unlinked (positive) gene regulatory networks [169]. AMR is reduced by negative epistasis during cyclical treatment, which decreases resistance to both drugs more effectively than to single drugs-nearly as well as dual drug therapies [170]. Certain compensatory mutations alter AMR without being clearly associated with the phenotype-for instance, blaCTX-M-linked carbapenem-resistance requires water channel (porin) gene loss to mitigate fitness costs [171]. Additionally, structural rearrangements at ISs are associated with rapid increases in fitness during longterm E. coli evolution [172].

Resistance through Cell Growth Arrest
Antimicrobials limit E. coli growth [173], which is controlled by gene expression [174]. Growth depends on environmental conditions and is controlled by the concentration of transcription-associated proteins [175]. Even without a specific stress, cells spontaneously enter and exit a quiescent non-growth state during both log and stationary growth phases [176]. Lowered expression is more strongly associated with non-essential genes like toxins [164]. There are (so far) 22 known toxinantitoxin systems in E. coli K12 that can be recruited during environmental stress to regulate growth, affecting biofilm formation [177]. For example, in the hipBA toxin-antitoxin system, antitoxin HipB represses serine-threonine-protein kinase HipA but has a short half-life. When HipB levels fall, HipA phosphorylates a glutamyl tRNA synthetase to mimic starvation [178], reducing translation and slowing growth, resulting in a non-growing phenotype [179].

Evolutionary Phylogenomics and Fitness Measurement
Pathogens resist antimicrobials by evolving: consequently, analysis of evolution is needed [180] which should operate on a genome-wide basis for outbreaks like ST131 to more finely resolve of evolutionary patterns at all potential AMR factors [181]. A direct application of this would to investigate the evolution of fluoroquinolone-resistance, because D87G (rather than D87N in ST131) in gyrA in Salmonella promotes sensitivity to other non-quinolone compounds, perhaps due to DNA coiling and this gene expression effects [182]. Genome-based evolutionary analysis and real-time diagnostic evaluation is now as cheap and time-effective as conventional approaches [183] and informative bioinformatics strategies underpin improvements in clinical microbiology [184].
Variation arising by recombination between strains can be deduced from the genome-wide DNA mutation density distribution [181]: signatures of HGT have much higher polymorphism rates, reflecting the mutational time spent in a different species. For H30-Rx, this includes MGEs, integrated phage DNA and capsule-related genes [185]. Comparing the most likely genealogy at a given panel of SNPs (haplotype) to the mean signal across the genome can identify ancestral and recent recombination between and within subpopulations [186] and determine the phylogeographic origin of specific recombination events in a coalescent framework [187]. Phylogenetic and genealogical histories at regions arising via HGT deviate substantially from the mean genome-wide phylogenetic reconstruction [188,189].
New beneficial variants increase in frequency faster than average [190] and retain a more recent time of origin [191]. Phylogenetic branching topology can measure relative fitness assuming a persistent source of antimicrobial pressure causing mutations with small fitness effects [192]. The latter is valid for E. coli because many genes regulate AMR and virulence [156,193]. No variation in relative fitness would mean low variability in the rates of coalescence of descendent nodes compared with ancestral ones. In contrast, a high difference in relative fitness would mean highly fit ancestors will produce many more descendent nodes. This would appear as a radiating population: such as H30 within ST131, H30-R within H30, and H30-Rx with H30-R. So what will follow H30-Rx?

Modelling Historical and Future Epidemiology
Systematic monitoring of infection and transmission in human populations can be used to infer sources of ST131, including from potential reservoirs like livestock and intestinal colonisers [61]. Current hypotheses pinpoint the Indian subcontinent as a possible original reservoir for ST131. During the discovery of ST131 in 2008, there was a high rate of recent travel to this region in those with H30-Rx ST131 infection in New Zealand [194] and Canada [195,196]. Whether this was caused by both a single H30-Rx clone and through plasmid-mediated HGT is unclear [197] due to the limited sampling of the Indian subcontinent prior to 2002. Phylogenomic reconstruction of the evolutionary relationships would address this puzzle but requires extensive geographic sampling including samples during the acquisition of fimH30 in the 1990s and fluoroquinolone-resistance sometime before 2002.
Genome sequencing can help address the rise of AMR [38] by predicting future resistance. The most likely future dominant subtype of a pathogen is typically the most fit one [198]: this can be applied to ST131 based on de novo and known mutations at key AMR determinants using evolutionary methods developed for viruses [199]. The relative rate of allele frequency change reflects its association with antimicrobial selection pressure [200], and can be measured from time-series genome data [201] to distinguish this effect from random drift [202]. Such sampling over extended periods allows the identification of genes implicated in adaptation based on the mutation's age, functional effects, and ancestral phylogenetic position [203].
Phylodynamic models determine future evolutionary and epidemiological patterns based on phylogenetic tree shape [204]. Such schemes have revealed missed events during transmission of tuberculosis [205], pin-pointed an asymptomatic spreader of methicillin-resistant S. aureus between babies [206], assessed the potential for multiple simultaneous outbreaks to originate from the same common ancestor [207], and examined evidence for multiple origins of infection [208]. Although such methods have not yet been applied to E. coli, they can enhance our understanding of the inter-host spread of AMR. For instance, gyrA1AB E. coli with a marR gene deletion are much more ciprofloxacin tolerant if they express a plasmid qnrS1 (quinolone resistance) gene as well in vitro [209].

Assessing Cell Growth Arrest
Stochastic phenotype switching is common to all domains of life, is accentuated by antimicrobial stress, and in bacteria is observed as changes in gene expression. This growth-dormancy bistability is an evolutionary bet-hedging strategy that results in a mix of cells with varying gene expression levels [210]. It arises due to gene regulatory network structure, leading to rare but occasional switches from the growing wild-type (WT) state to a non-growing dormant one. Cell-cell interactions may affect variation in gene expression rates [211], allowing this randomness to adaptively drive regulatory changes.
Cell dormancy causes large shifts in metabolic gene expression [212]. This contrasts with ribosomal gene expression that has buffering mechanisms [174,213] and so scales linearly with cell growth [214]. E. coli alter their carbon metabolism rapidly [215] by co-regulating enzymes to cope with rapidly changing environmental conditions [216]. However, metabolic flux is regulated extensively [217,218] so differing concentrations can produce the same net metabolic effects [219,220] or affect other pathway components [221]. E. coli optimise gene expression by moderating ribosome production [222]: this means shifts in metabolic gene expression can be compared to those for ribosomal RNA, which should match the antimicrobial toxicity and bacterial AMR level. AMR-driven changes in gene expression vary non-monotonically with dose and are a function of the complexity of the mechanism: antimicrobials inactivating targets may elicit higher target expression, whereas drugs causing gain-of-function changes may cause mutation [110]. This means the relative rates of metabolic and ribosomal gene activity provide sufficient information to deduce AMR levels and types.

Avoiding Partial Treatments and Cell Population Heterogeneity
Suboptimal treatment regimens [34], poor compliance [223] and drug pharmacokinetics [224] lead to spatial structure in antimicrobial concentrations. This is frequent in human and animals both at an individual level [225] and in groups of treated and untreated patients [226]. At low antimicrobial doses (such as a minimal selective concentration [227]), WT and resistant cells grow equally well, whereas only resistant cells would survive when exposed to dose greater than the minimal inhibitory concentration (MIC). Partial treatment regimens accelerate AMR for mechanisms requiring numerous mutational steps with small fitness costs by offering a sub-MIC sanctuary where resistant cells can evolve further despite transiently lowered fitness [226]. However, partial regimens can slow AMR if it requires few mutational steps or substantial temporary falls in fitness [228]. In addition, the mutational time required for resistance emergence may be a function of cell density [229].
Spatial or temporal variation in treatment application allows WT cells more time to accrue new mutations, causing faster AMR evolution [230]. These genetically resistant cells are known as type 1 persisters to differentiate them from type 2 persisters that are resistant due to gene expression control causing non-growth [231]. Although E. coli type 2 persister gene activity depends on adjacent cells [165], they revert to growth after dormancy ends, and attain exponential growth rates once resistant mutations occur when all WT cells have died [232]. These type 2 persisters may require a genetic predisposition [231] and initially comprise just 0.001% of cells [233].

Dissecting Measurably Evolving Infections
A single ST131 infection may become a genetically diverse cell community descended from a recent common ancestor driven by antimicrobial selective pressure. Multiple populations could exist: for example S. aureus measurably evolves (30 SNPs and 4 indels in 16 months) to magnify virulence [234]. Furthermore, distinct Burkholderia dolosa lineages co-exist within individual cystic fibrosis patients for years and have undergone extensive clonal interference during exposure to antimicrobials [235]. Soft selective sweeps during host adaptation of Pseudomonas aeruginosa are prevalent, indicating different loss-of-function mutations may target the same gene within a heterogeneous cell community [236].
Consequently, deep genome sequencing is required to grapple with mutations in a cell population [237]. Such population mixes could be described using birth-death [181], logistic growth [238] and two-level population models within a single infection [239] or a structured environment [240] to optimise antibiotic treatment protocols [241]. However, there are caveats: high-dose antimicrobial treatments may give rise to a single dominant variant within the host [110], rare variants may become more advantageous at differing rates as the host microenvironment changes [190], and mutants may be pre-existing rather than de novo [124].

Conclusion: Future Avenues for ST131 Infection Genomics
Genome sequencing can predict virulence, toxicity and AMR phenotypes in E. coli ST131 [242]. It is a pivotal tool for infection control because it facilitates the decoding of molecular mechanisms of treatment resistance [243] and can dissect outbreaks of monomorphic bacteria [244]. It can decipher transmission routes and inter-host contact [181], and evaluate genetic diversity within hosts over time [245]. This evolutionary approach should become the basis for analysing E. coli ST131 outbreaks. A closing point is that there are several other possibilities for genome-based AMR control not discussed here: E. coli post-transcriptional processing disturbs expected correlations between mRNA and protein levels [246]-this includes AMR-driven regulatory changes by small RNAs [247]. Additionally, epigenetic heterogeneity within E. coli cell populations screened using long-read sequencing methylation data [248] is associated with extensive regulatory control of phenotypes [249]. Finally, activity changes in a small set of genes can predict AMR, such that integration of genome-wide and transcriptome profiling is an effective approach to understand AMR levels [250].