Globally Disseminated Multidrug Resistance Plasmids Revealed by Complete Assembly of Multidrug Resistant Escherichia coli and Klebsiella pneumoniae Genomes from Diarrheal Disease in Botswana

: Antimicrobial resistance is a disseminated global health challenge because many of the genes that cause resistance can transfer horizontally between bacteria. Despite the central role of extrachromosomal DNA elements called plasmids in driving the spread of resistance, the detection and surveillance of plasmids remains a signiﬁcant barrier in molecular epidemiology. We assessed two DNA sequencing platforms alone and in combination for laboratory diagnostics in Botswana by annotating antibiotic resistance genes and plasmids in extensively drug resistant bacteria from diarrhea in Botswana. Long-read Nanopore DNA sequencing and high accuracy basecalling effectively estimated the architecture and gene content of three plasmids in Escherichia coli HUM3355 and two plasmids in Klebsiella pneumoniae HUM7199. Polishing the assemblies with Illumina reads increased base calling precision with small improvements to gene prediction. All ﬁve plasmids encoded one or more antibiotic resistance genes, usually within gene islands containing multiple antibiotic and metal resistance genes, and four plasmids encoded genes associated with conjugative transfer. Two plasmids were almost identical to antibiotic resistance plasmids sequenced in Europe and North America from human infection and a pig farm. These One Health connections demonstrate how low-, middle-, and high-income countries collectively beneﬁt from increased whole genome sequencing capacity for surveillance and tracking of infectious diseases and antibiotic resistance genes that can transfer between animal hosts and move across continents.


Introduction
Antimicrobial resistance (AMR) threatens the future availability of therapies to cure and prevent infectious diseases. Antimicrobial treatment failures result in increased health burdens and medical costs, elevating morbidity and mortality rates. Of critical concern is the declining effectiveness of medications to treat bacterial infections, especially in lowand middle-income (LIMCs) countries where bacterial disease burdens are greatest, and antibiotics can help protect child and maternal health. Bacterial resistance to antibiotics is estimated to become the greatest challenge in healthcare, with Africa forecast to suffer the largest negative impacts [1,2].
A driving force of the resistance pandemic is the ability of many antibiotic resistance genes (ARGs) to transfer between bacteria [3]. The high rates and facility with which these 'acquired' resistance genes mobilize between bacteria make them moving targets that do not conform to standard epidemiological tracking or respond to classic interventions. Genetic elements called plasmids are primary vectors for ARG mobility. Because plasmids replicate independently of the bacterial chromosome, they can be inherited vertically by daughter cells or can be transferred horizontally to adjacent cells in bacterial communities [3]. Prime examples include the carbapenem-resistant and extended-spectrum ß-lactamase encoding plasmids in Enterobacteriaceae that spread globally across clinical and non-clinical environments soon after first detection [4]. Another important feature of ARGs is that they can coalesce into gene islands. Subsequent acquisition of a multi-resistance island by a bacterium contributes directly to the emergence of multi-drug resistant cell.
The genetic composition of plasmids changes through high rates of recombination that add, subtract, and shuffle genes. The plasticity of plasmid DNA sequences, the modularity of gene cassettes, and the abundance of repetitive elements in plasmids hampers the assembly of whole plasmids from short-read whole genome and metagenomic datasets [5,6]. Consequently, plasmid epidemiology is poorly understood. Long-read DNA sequencing platforms are making strides towards assembling whole plasmids for genomic epidemiology [7], but the most accessible platforms from Oxford Nanopore are still hindered by lower quality of base calling.
Characterizing and tracking plasmids is a necessary step in the global effort to identify, track, and mitigate the spread of AMR. Whole genome sequencing of bacterial isolates enhances AMR surveillance at national and international scales, and has been successfully applied in LMICs to identify drivers of AMR [8][9][10][11]. Botswana is a developing country in southern Africa with high prevalence and burdens of communicable diseases [12]. Current infectious disease diagnostics in Botswana rely primarily on traditional culture-based methods for isolation and characterization of infectious bacteria. Studies of antibiotic resistance in Botswana rely on conventional microbiological culture and PCR methods to isolate antibiotic resistant bacteria and detect ARGs [13][14][15]. Cost and training barriers have impaired deployment of next generation DNA sequencing in LMICs. These barriers include limited availability of sequencing platforms, a lack of standardized definitions and quality control metrics, and a shortage of computer capacity [16]. In Botswana for example, DNA sequencing is contracted to South African service labs. An exception is the use of Oxford Nanopore sequencing at the Botswana-Harvard AIDS Institute Partnership (BHP) reference laboratory, but this is solely for HIV genotyping. Consequently, Botswana researchers rely on international collaborations to conduct genome sequencing projects while Botswana public health lacks access to next generation DNA sequencing.
The primary objective of this study was to compare assembly and annotation workflows using Illumina and Oxford Nanopore sequence data for accurate identification and characterization of plasmids and ARGs. We also wished to address the deficit of sequenced and assembled multidrug resistance plasmids in Africa. The study was conducted using multidrug resistant bacterial isolates from human infections in the Chobe district in Northern Botswana, which is globally recognized as a wildlife tourist destination that attracts many international visitors. From our assessment of the cost and effectiveness of DNA sequencing methodologies for deployment in LMICs, we present the benefits of implementing DNA sequencing for identification of antibiotic resistant bacteria and their plasmids in Botswana.

Bacterial Isolation and Antibiotic Susceptibility Testing
Two extensively multi-drug resistant bacterial isolates were acquired from patients hospitalized for diarrheal disease in Chobe District, Botswana in 2012. These fecal samples were collected as part of a long-term study of diarrheal disease and antimicrobial resistance in Chobe District [17][18][19][20]. This study was conducted under permit from the Botswana Ministry of Environment, Natural Resources Conservation and Tourism (EWT8/36/4) and the Botswana Ministry of Health and Wellness (HPSME:13/18/1 Vol. X [878]). Approval was also obtained from the Virginia Tech Institutional Review Board (#11-573). Fecal samples used in this study represent archived, anonymized laboratory samples collected from patients presenting to the primary hospital and clinics in Chobe District, Botswana. All study activities were carried out in accordance with relevant guidelines and regulations.

Whole Genome Sequencing
Genomic DNA was extracted using a BioBasic EZ-10 spin column genomic DNA miniprep kit (BS423). Short-read library preparation was conducted using the NEBNext Ultra II DNA library kit for Illumina (E7645) and the NEBNext ® Multiplex Oligos for Illumina (E7335). All steps were conducted according to the NEB protocols. HUM3355 was sequenced on the Illumina MiSeq platform using Reagent Kit v2 (300-cycle, Illumina Inc.), producing 2 × 150 bp paired-end reads. HUM7199 was sequenced on the MiSeq platform using Reagent Kit v3 (600-cycle, Illumina Inc., San Diego, CA, USA), producing 2 × 300 bp paired-end reads.
For long-read sequencing, the HUM3355 DNA was sheared using g-TUBEs (Covaris Inc., Woburn, MA, USA) following the manufacturer's protocol to obtain fragments averaging around 8 kbp in length, whereas the HUM7199 DNA was not sheared by g-TUBE. Long-read sequencing libraries were prepared using the SQK-LSK109 ligation sequencing kit and the EXP-NBD103 native barcoding expansion kit (Oxford Nanopore Technologies, Oxford, UK). Libraries were sequenced on a MinION R9.4.1 flow cell. All library preparation and sequencing steps were conducted according to the Oxford Nanopore Technologies protocols. Basecalling was performed by Guppy v4.2.2 using the HAC model (dna_r9.4.1_450bps_hac.cfg), followed by demultiplexing.

Sequence Analysis, Visualization, and Annotation
The two multi-drug resistant bacterial isolates were identified to species level by submitting the genomes assembled from Illumina sequencing to the Public databases for molecular typing and microbial genome diversity (PubMLST; https://pubmlst.org, accessed on 1 July 2019) [30]. De novo assembled whole genome sequences were visualised using Bandage [31].
Plasmid assembly alignments were created with BRIG [32]. The whole genome assembled contigs were imported into the Pathosystems Resource Integration Center, PATRIC (https://patricbrc.org/, accessed on 26 July 2021) for comprehensive genome analysis [33]. Plasmid contigs from polished assemblies were annotated with the Rapid Annotation for Subsystem Technology (RAST) server [34]. Resistance gene identifier (RGI) software on the Comprehensive Antibiotic Resistance Database (CARD) was used to predict and analyze the resistome of each whole genome assembly [35]. Identification of acquired ARGs was conducted in ResFinder database [36]. Replicon typing was conducted using PlasmidFinder server available online at the Center for Genomic Epidemiology (http://www.genomicepidemiology.org/services/, accessed on 1 July 2019) [37]. MOB groups were predicted by MOBScan [38], and plasmid taxonomic units were assigned by COPLA [39]. Visual annotated physical maps of plasmids were generated using the BLAST ring image generator [32] and Vector NTI 10.3.0 (Invitrogen Corporation, Carlsbad, CA, USA).

Comparison of Illumina and Nanopore DNA Sequence Data for Whole Genome Assemblies
Whole genome sequencing was conducted for two multi-drug resistant bacterial isolates, HUM3355 and HUM7199, that were isolated from human diarrhoea. Quality trimmed read data from Illumina (short-read) and Nanopore (long-read) platforms were assembled using dedicated workflows optimized for each type of read data ( Figure S1). The Public databases for molecular typing and microbial genome diversity (PubMLST) designated HUM3355 as Escherichia coli and HUM7199 Klebsiella pneumoniae.
Visualization of the de novo genome assemblies was conducted using Bandage (31). Assemblies from the Nanopore data resolved singular circular chromosomes and multiple extrachromosomal elements in both species ( Figure 1A). Assembly of Nanopore reads predicted a 4,714,922 bp chromosome plus four circular extrachromosomal elements in HUM3355, though the smallest contig was discarded as spurious because not a single read in the in the Illumina dataset matched sequence in this contig. Assembly of Nanopore reads predicted a 5,106,117 bp chromosome plus two circular extrachromosomal elements in HUM7199. Assemblies based on Illumina-only data predicted genome lengths of 4,859,045 bp for E. coli HUM3355 and 5,395,117 bp for K. pneumoniae HUM7199, fragmented in 250 and 2149 contigs, respectively, with a low number of potential connections between contigs in both genomes ( Figure 1B).
The higher error rates of third generation (long-read) DNA sequencing can be corrected by polishing a long-read assembly using higher accuracy second-generation (short-read) DNA sequencing data from the same genome. Pilon is a program that uses short-read datasets to polish assemblies by correcting bases and fixing small-scale mis-assemblies [23]. Polishing the Nanopore assemblies with the Illumina data had only minor impact on the overall assemblies and annotations ( Figure 1A); the bioinformatic workflow is illustrated in Figure S1. Hybrid polishing the HUM3355 genome made only very slight changes to the estimated sizes of replicons. Hybrid polishing of the HUM7199 genome reduced the overall estimated size by 3656 bp (less than 0.07%), yet retained the three replicon architecture resolved by Nanopore alone. The higher error rates of third generation (long-read) DNA sequencing can be corrected by polishing a long-read assembly using higher accuracy second-generation (shortread) DNA sequencing data from the same genome. Pilon is a program that uses shortread datasets to polish assemblies by correcting bases and fixing small-scale mis-assemblies [23]. Polishing the Nanopore assemblies with the Illumina data had only minor impact on the overall assemblies and annotations ( Figure 1A); the bioinformatic workflow is illustrated in Figure S1. Hybrid polishing the HUM3355 genome made only very slight changes to the estimated sizes of replicons. Hybrid polishing of the HUM7199 genome reduced the overall estimated size by 3656 bp (less than 0.07%), yet retained the three replicon architecture resolved by Nanopore alone.

Whole Genome Comparative Analysis and Detection of Antibiotic Resistance Genes
Oxford Nanopore Technologies sequencing has the distinct advantage of being able to resolve plasmids in bacterial genomes. However, historically lower basecalling accuracy raises concerns about the quality of variant identification and gene annotations. We compared the Comprehensive Antibiotic Resistance Database (CARD) predictions of ARGs in the hybrid polished, Nanopore only, and Illumina only genome assemblies.

Whole Genome Comparative Analysis and Detection of Antibiotic Resistance Genes
Oxford Nanopore Technologies sequencing has the distinct advantage of being able to resolve plasmids in bacterial genomes. However, historically lower basecalling accuracy raises concerns about the quality of variant identification and gene annotations. We compared the Comprehensive Antibiotic Resistance Database (CARD) predictions of ARGs in the hybrid polished, Nanopore only, and Illumina only genome assemblies. There was high agreement between the hybrid polished and Nanopore only assemblies ( Figure 2). In the hybrid polished E. coli HUM3355 genome, CARD identified 40 perfect hits, 36 of which were also identified in the Nanopore only assembly. CARD also identified 36 of the same genes in the Illumina only assembly, but only 27 as perfect matches to the CARD references. CARD identified 13 perfect ARG hits in the polished and Nanopore only HUM7199 assemblies (Figure 2), three of which were not detected in the Illumina only assembly.
Both genomes contained sufficient antibiotic resistance genes to account for each resistance phenotype (Tables 1 and 2). The large number of predicted resistance genes in E. coli HUM3355 were distributed across the chromosome and plasmids. In fact, the chromosome encoded genes predicted to confer resistance to all six classes of antimicrobial classes tested. Plasmid pHUM3355_A encoded 21 predicted resistance genes, also covering all classes of antibiotics tested. Fluoroquinolones present an interesting case for efforts to predict phenotypes from gene annotations: achieving clinical resistance to fluoroquinolones requires the additive activities of multiple resistance genes and usually requires a chromosomal resistance allele [40]. Thus, the four genes in E. coli HUM3355 that can contribute to fluoroquinolone resistance can be interpreted to potentiate resistance but likely only reduce sensitivity to ciprofloxacin (Table 1). K. pneumoniae HUM7199 was found to encode far fewer resistance genes on its chromosome. In this strain, the smaller plasmid pHUM7199_B encoded most resistance genes.
The COPLA universal plasmid classification scheme classified the plasmids as follows. pHUM3355_A (PTU-C, host range V), pHUM3355_B (PTU-FE, host range III), pHUM3355_C (PTU-I1, host range III), pHUM7199_A (PTU-FK, host range III), and pHUM7199_B (PTU-R, host range I). Host range V suggests that pHUM3355_A has a very broad host range and can transfer between bacterial Classes. pHUM3355_B, pHUM3355_C, and pHUM7199_A host range III suggests an ability to transfer between bacterial Families. A host range prediction of I suggests pHUM7199_B may be restricted to transfer within the species K. pneumoniae.  All five plasmids carry maintenance and stability genes associated with either partitioning, addiction, anti-restriction, or SOS inhibition mechanisms (Table S1). Three plasmids carry operons encoding genes associated with heavy metals: mercury (merE/D/S/C/R), copper (cusS/R/C/F/B/A, copA/B/C, PcoR/S/E), arsenic (arsR/H), nickel (nikA/B/C/D/E/R) and iron (fecI/R/A/B/C/D) (Figure 3).
pHUM3355_C aligned to plasmid pRHB17-C09_3 (96,066 bp; accession CP057698) from E. coli from a pig fecal sample in the United Kingdom and plasmid p12-4374_96 (96,042 bp; accession CP012929) from Salmonella Heidelberg from human stool isolated in Canada ( Figure 4A). Comparison of pHUM3355_C to other plasmids revealed that the β-lactamase gene bla CMY-2 is located within a 3649 bp accessory region.
K. pneumoniae pHUM7199_A contains one cluster of mobile elements and heavy metal resistance that is highly conserved across the top 10 BLAST hits in GenBank for this plasmid ( Figure 4B). This region is composed of mobile elements with genes associated with metals copper and arsenic (cusS/R/C/F/B/A, copB/C/D/R/S/E, arsR/H/D/A/B). The second resistance gene cluster is an accessory region (63,308 bp) that is not conserved in the top 10 most closely related plasmid sequences. This region comprises several genes conferring resistance to beta-lactams (bla SCO-1 , bla TEM-1β , bla CTX-M-15 ) and genes associated with heavy metals mercury, nickel and iron.
pHUM7199_B has a large 36,410 multi-resistance region that is shared with three other plasmids, all isolated from K. pneumoniae: CP009116 from human urine in the USA, and two larger unnamed plasmids CP063009 from human tissue in Russia and CP063012 from human blood in Russia. This region includes the class-1 integron-integrase and multiple resistance gene clusters; [intI1-acc(6)-Ib-cr-aar-3-drfA27-aadA16-qacEdelta-sul1], [tetD-tetR-catA2], [blaTEM-1β-aph(6)-Id-aph(3)-Ib-sul2] and [merE/D/A/C/P/T/R] ( Figure 4B). step in understanding the mechanisms and epidemiology of antimicrobial resistance [36]. Resistance phenotypes alone are insufficient for pathogen identification and source tracking during an outbreak because of extensive overlap in the resistance phenotypes between unrelated bacteria. Equally complicating is that the genetic determinants of resistance can transfer horizontally from reservoirs in non-pathogenic bacteria to pathogens as well as transfer between pathogens.
The 'high accuracy calling' model for Oxford Nanopore Technologies MinION sequencing exceeded our expectations and past experience by generating closed, high quality genome assemblies. Previously we have relied on the high accuracy of Illumina sequencing to correct the relatively low accuracy of Nanopore sequencing; however, with the high accuracy Nanopore assembly, polishing provided only minor refinements. Both genomes were reduced in size by polishing, but gene annotation improved as observed with an increase in perfect matches to ARGs in the CARD database (Figures 1 and 2). Despite the relatively low genome coverage in the Illumina datasets, CARD identified many of the same genes detected in the polished genomes, but some of these were only strict, not perfect, matches to the CARD reference genes ( Figure 2). Nanopore sequencing demonstrated the additional capability of resolving multi-copy TEM-1 in both HUM3355 and HUM7199, whereas Illumina only assemblies cannot distinguish multiple identical copies ( Figure 2).
The 'One Health' paradigm recognizes the mobility of ARGs through interconnected human, animal and environmental systems [42][43][44][45][46]. Our study emphasizes the importance of One Health considerations for combating AMR. In two extensively drug resistant bacteria isolated from human diarrhoea in a small town in northern Botswana, we sequenced and assembled five antimicrobial resistance plasmids that contain gene islands previously identified in human and animal systems on other continents. Two plasmids, pHUM3355_B and pHUM3355_C are almost identical to plasmids isolated independently from humans and pigs in North America and Europe. With few comparator sequences available it is impossible to infer where and when the common ancestors of these plasmids originated or how they became co-resident in E. coli in northern Botswana. The Chobe district is a low traffic but highly prized tourist destination for Europeans and Americans, prompting us to speculate that tourism has transported these resistance plasmids into and/or out of the Chobe district to other corners of the planet.
LMICs will continue to face the greatest burdens of infectious diseases and AMR. The democratization of DNA sequencing through rapid technological advances attracts much attention. Yet, DNA sequencing is only one component of genomic-informed infectious disease epidemiology. A need for computer language and coding abilities to execute the more popular DNA assembly packages makes bioinformatics a hurdle in public health and research labs around the world. Free-to-use web-based platforms are effectively reducing barriers by providing popular workflows in graphical user interfaces. We used the Centre for Genomic Epidemiology (https://www.genomicepidemiology.org, accessed on 1 July 2019) extensively for its multiple predictive tools. The Bacterial and Viral Bioinformatics Resource Centre (www.BV-BRC.org) provides high quality and easy to use workflows for genome assembly and annotation using Illumina or Oxford Nanopore Technologies DNA sequence data. Additionally, restrictive in LMICs is the need for sufficient computer infrastructure; fortunately, assembly tools can be launched on most desktop computers and platforms like BV-BRC provide the necessary computational capacity.
In this study we detected plasmid-borne AMR genes in both E coli HUM3355 and K. pneumoniae HUM7199 associated with resistance to seven clinically relevant antimicrobial classes: aminoglycoside, beta-lactam, chloramphenicol, rifamycin, sulfonamide, trimethoprim, tetracycline, and quaternary ammonium compounds. Additionally, plasmids pHUM3355_B and pHUM7199_B were found to carry class 1 integron elements, which are gene capture elements associated with global dissemination of ARGs [47]. All three plasmids identified in E. coli were predicted to be self-transmissible and belong to different incompatibility groups. pHUM3355_A belongs to the IncC incompatibility group, a large group of broad-host range conjugative plasmids that are the main drivers of global dissemination antibiotic resistance among several pathogenic species of Gammaproteobacteria associated with food products, food-producing animals, and humans [48]. The top 100 blast hits (>60% Query Coverage, >99% Identity) from query sequence of pHUM3355_A correspond to resistance plasmids isolated from a wide range of genera within phylum Proteobacteria, including Edwardsiella, Proteus, Klebsiella, Escherichia, Providencia, Enterobacter, Vibrio, Yersinia, Salmonella, Aeromonas, and Photobacterium. Plasmid pHUM3355_B corresponds to plasmids in Escherichia and Salmonella, with closely related plasmids identified in pathogenic strains of E. coli in the UK [49] and S. enterica serovar Typhimurium in Ghana [50]. PlasmidFinder identified a truncated IncQ replicon region in pHUM3355_B; the 529 bp match to the full length (796 bp) replicon is annotated as non-functional according to PlasmidFinder. Plasmid pHUM3355_C is a typical IncI1α plasmid, containing highly conserved tra and pil conjugative transfer regions. Similar plasmids carrying the signature beta-lactamase CMY-type gene have been described globally [51]. The top blast hits from query pHUM3355_C correspond to plasmids circulating in Escherichia, Shigella, and Salmonella.
The multi drug resistant K. pneumoniae HUM7199 carried one predicted conjugative plasmid (pHUM7199_A, IncFIB(K)/IncFII(K)) and one non-mobile plasmid (pHUM7199_B, IncR). The IncFIB (also denoted IncFIB K ) is known to be widespread among Klebsiella strains playing a role in the dissemination of antimicrobial resistance and virulence genes [52]. Top BLAST hits of the query pHUM7199_A are exclusively found in K. pneumoniae, potentially reflecting the high number of plasmids that have been sequenced in Klebsiella. The IncR plasmid replicon was first discovered in 2006 in multi-drug resistant K. pneumoniae plasmid pK245 [53], and this plasmid type is observed to allow co-selection of both virulence and resistance functions [54]. pHUM7199_B shares a genetic backbone composition related to other previously described IncR plasmids, which also appear to be non-conjugative [53][54][55].

Conclusions
Plasmids are key epidemiological markers for tracking the emergence and dissemination of clinically relevant ARGs in human, animal, and ecosystem, but they are largely overlooked because most projects use Illumina's short-read DNA sequencing, which is insufficient for assembling most plasmids. Combining Illumina and Nanopore sequencing approaches proved superior for accurate resolution of whole genome architecture and improved ARG calling. As Botswana and other African nations begin to apply whole genome sequencing of pathogens for public health, this study validates the benefits of combined short-and long-read DNA sequencing data, but also demonstrates that the advances in Oxford Nanopore Technologies sequencing can provide robust assemblies and high-confidence detection of ARGs even in the absence of supporting Illumina data.