Plasmid Identification and Plasmid-Mediated Antimicrobial Gene Detection in Norwegian Isolates

Norway is known for being one of the countries with the lowest levels of antimicrobial resistance (AMR). AMR, through acquired genes located on transposons or conjugative plasmids, is the horizontal transmission of genes required for a given bacteria to withstand antibiotics. In this work, bioinformatic analysis of whole-genome sequences and hybrid assembled data from Escherichia coli, and Klebsiella pneumoniae isolates from Norwegian patients was performed. For detection of putative plasmids in isolates, the plasmid assembly mode in SPAdes was used, followed by annotation of resulting contigs using PlasmidFinder and two curated plasmid databases (Brooks and PLSDB). Furthermore, ResFinder and Comprehensive Antibiotic Resistance Database (CARD) were used for the identification of antibiotic resistance genes (ARGs). The IncFIB plasmid was detected as the most prevalent plasmid in both E. coli, and K. pneumoniae isolates. Furthermore, ARGs such as aph(3″)-Ib, aph(6)-Id, sul1, sul2, tet(D), and qnrS1 were identified as the most abundant plasmid-mediated ARGs in Norwegian E. coli and K. pneumoniae isolates, respectively. Using hybrid assembly, we were able to locate plasmids and predict ARGs more confidently. In conclusion, plasmid identification and ARG detection using whole-genome sequencing data are heavily dependent on the database of choice; therefore, it is best to use several tools and/or hybrid assembly for obtaining reliable identification results.


Introduction
Antimicrobial resistance (AMR) is the ability of microorganisms to resist antimicrobial treatments, especially antibiotics. Infections due to AMR bacteria are a threat to modern health care and are responsible for an estimated 700,000 and 33,000 deaths/year globally and in Europe, respectively [1]. Recently, the World Health Organization (WHO) published a list of pathogens for which urgent global action is needed [2]. Extended spectrum β-lactamases (ESBL) producing and carbapenem-resistant Enterobacteriaceae are among the priority one critical section of the WHO pathogen list. There has been a global rise in infections caused by multi-drug resistant clones of Enterobacteriaceae, particularly Klebsiella pneumoniae and Escherichia coli [3].
AMR can arise through various mechanisms, including mutations of chromosomal genes and the acquisition of antibiotic resistance genes (ARGs) from other strains in a process termed horizontal gene transfer (HGT). It is the sharing of genes through HGT that has largely contributed to the global dissemination of ARGs [4]. The genomes of E. coli and K. pneumoniae are prone to a mutation in stress, depicting the genetic flexibility to upregulate their natural resistance and acquire foreign determinants through HGT due to mobile genetic elements. These elements, such as plasmids, transposons, integrons, and genomic islands, harbor ARGs [5]. Several plasmids like IncF and IncI1 plasmids are known to carry resistance genes in E. coli, K. pneumoniae, and other Enterobacteriaceae [6]. Additionally, the ColE plasmids, which encode colicins, and have killing activity against different bacteria, are also important plasmids [7]. Broad range resistance plasmids are known to be associated with pathogens; for example, a resistance plasmid from Enterobacteriaceae can be transferred to a wide variety of Gram-negative organisms.
Whole-genome sequencing (WGS) is an effective method of tracking the onward transmission of bacteria or resistance plasmid transfer between bacteria. It has made it possible to determine and evaluate an organism's whole DNA sequence at low costs in a short period of time. It allows for the identification of antimicrobial resistance and the early detection of outbreaks or their epidemiological investigation [8]. Moreover, plasmid assembly and characterization following WGS is a difficult task. This happens because the plasmids tend to contain repeat sequences with sizes greater than sequences generated by sequencing platforms such as Illumina technology (San Diego, CA, USA) [9]. Therefore, the need for in silico plasmid detection has emerged due to the difficulty of plasmid DNA purification if they are longer than 50 kbp [10]. In addition, regarding the need for an efficient plasmid identification tool, ARG databases with comprehensive and accurate gene records are needed to assess AMR prevalence. Although several ARG databases exist, Comprehensive Antibiotic Resistance Database (CARD), and ResFinder are the most effective and have sustainable curation strategies [11]. Recent studies have shown that the hybrid assemblies, which are a combination of Illumina and long-read sequencing (e.g., Oxford Nanopore Technology's MinION) data, are better at identifying plasmids and ARGs [12]. However, this requires advanced bioinformatic and machine learning methods for WGS data analysis [13][14][15].
Globally, AMR is unevenly distributed. Recently, Klein et al. investigated the drug resistance index (DRI) for 41 countries [16]. They have compared the reported data on antibiotics' use and their resistance to the treatment of infections caused by microorganisms from the WHO priority list [2]. Norway is among the countries with the lowest DRIs (third lowest), and has a DRI value around four-fold lower than that of the country with the highest DRI; India. However, there is an increasing trend in AMR cases in Norway. For example, the percentage of E. coli with ESBL, causing septicemia, has increased ten-fold in the last ten years [17]. There has also been a slightly increased prevalence of ESBLs for E. coli (6.6% in 2017 and 6.5% in 2018) and Klebsiella spp. (5.3% in 2017 and 6.6% in 2018) [17].
This research has utilized different tools and databases to identify plasmids and predict plasmid-mediated ARGs in both E. coli and K. pneumoniae isolates. Our results indicate that plasmid identification and ARGs prediction are database/tool dependent. In this regard, a hybrid assembly can be considered an efficient way to identify plasmids and predict plasmid-mediated ARGs.

Sample Collection and Characterization
In this study, E. coli and K. pneumoniae isolates were collected from blood and urine specimens of Norwegian patients, in collaboration with Oslo University Hospital. The sample overview is in Table 1.

Library Preparation and Whole-Genome Sequencing
The WGS data used in this study are from our recent work, which was performed at Oslo University Hospital [18]. In brief, DNA was isolated from bacteria colonies using QIAamp DNA minikit (Qiagen, Hilden, Germany) following the manufacturer's instructions and was quantified using Qubit fluorometer (Life Technologies, Carlsbad, CA, USA). The libraries were constructed using the Nextera XT kit (Illumina Ltd., San Diego, CA, USA) according to the manufacturer's recommendations. The libraries were sequenced in pair-end mode (2 × 300 bp) using the Illumina MiSeq platform at the Norwegian Sequencing Center (Oslo, Norway). Furthermore, to make a hybrid assembly, we sequenced three more isolates (E. coli 39, K. pneumoniae 23, and 27) using the nanopore and the Illumina MiSeq sequencing platforms. Details regarding library preparation, sequencing, and hybrid assembly have previously been reported [12]. All bioinformatic analyses for both plasmid and hybrid assemblies were identical and performed as described below.

Quality Control and Trimming of Illumina Sequences
Initially, Illumina and Nanopore reads were quality checked using FastQC (v 0.11.8 for Linux) [19]. Then Illumina adapters were removed, and low-quality reads (Phred below 25) were filtered out using Trimmomatic with default parameters [20]. Before downstream analyses, trimmed reads were again quality checked using FastQC software.

Plasmid Assembly and in Silico Plasmid Identification
Putative plasmid sequences were assembled using plasmid flag in SPAdes (v 3.14.1 for Linux) [21]. General statistics of the assembled putative plasmids was assessed using QUAST (v 4.6.0 for Linux) [22]. Putative plasmid sequences were further confirmed using PlasmidFinder (software version: 2.0.1, database version: 2020-07-13) with minimum identity and coverage of 95% and 60%, respectively [23]. In addition to PlasmidFinder, the identification of putative plasmids was performed using two other methods. First, plasmid reference sequences were downloaded from the PLSDB database [24] and a curated database developed by Brooks et al. [25], hereafter referred to as Brooks. Later, assembled putative plasmids were BLAST searched (sequence identity >95% and word size 28) against downloaded reference plasmids databases. Initially, hits (contigs) with coverage between 30 to 100% were extracted and utilized for the next step. Then only hits with qcov ≥90% (P TRUE ) were considered for downstream analysis. The qcov is unique query coverage per subject, calculated after considering any alignment overlaps between different fragments aligned with that specific subject in the database.

Identification of Plasmids Mediated Antimicrobial Resistance Genes (ARGs)
To identify ARGs hosted by plasmids, the assembled putative plasmids for each isolate were submitted to Resfinder 4.0 [26] and resistance gene identifier tool from Comprehensive Antibiotic Resistance Database-CARD [27]. In both Resfinder and CARD, only hits showing ≥95% identity and ≥98% length coverage were considered as ARGs. Later, only hits sharing the same contig with P TRUE were regarded as true plasmid-mediated ARGs (ARG Plasmid ). Hereafter we refer to ARG Plasmid-PlasmidFinder (meaning ARG and P TRUE from PlasmidFinder were found on the same contig), ARG Plasmid-Brooks (meaning ARG and P TRUE from Brooks were found on the same contig), ARG Plasmid-PLSDB (meaning that ARG and P TRUE from PLSDB were found on the same contig).

Results
In the current study, putative plasmid sequences and ARGs were identified in silico in E. coli and K. pneumoniae isolates from Norwegian patients. Furthermore, hybrid assemblies from additional three isolates were also analyzed.

General Statistics of Assembled Plasmid Sequences and Hybrid Assembled Sequences
The general statistics for assembled plasmid sequences and hybrid assembled sequences are shown in Table 2. We observed a higher number and bigger contig size for E. coli than K. pneumoniae isolates. The GC percentage between E. coli and K. pneumoniae was similar, and the N50 values (i.e., the minimum contig length required to cover 50% of the assembled genome sequence) were higher in E. coli, indicating larger contig size of plasmids which denotes good quality of assembly. Regarding hybrid assembled isolates, generally, bigger contigs and higher N50 values were observed ( Table 2). Interestingly GC percentage was higher in both K. pneumoniae isolates compared to the E. coli isolate. Table 2. An overview of general statistics (mean ± SD) obtained using the QUAST tool for the Scheme 39. and K. pneumoniae 23, 37).

In Silico Plasmid Validation
Assembled plasmid sequences were further validated with the PlasmidFinder online tool and using BLASTn against PLSDB and Brooks plasmid database. Using the Plas-midFinder tool, we identified plasmid replicons in 39 (67%) and 11 (25%) of E. coli and K. pneumoniae isolates. This corresponds to two to three and one to two plasmid replicons per isolate caring plasmids in E. coli and K. pneumoniae, respectively. (Table S1).
The number of putative plasmids (P TRUE ) after BLASTn and removing duplicates hits per isolate for both E. coli and K. pneumoniae is shown in Figure 1A. Overall a higher number of P TRUE was detected for E. coli than K. pneumoniae (Table S1). For E. coli, the majority of P TRUE (122 of 173) detected in the Brooks database were detected using PLSDB as well. In K. pneumoniae, almost all the P TRUE (29 of 30) detected using the Brooks database were also detected using the PLSDB database. Additionally, we detected 13 shared plasmids between E. coli and K. pneumoniae using the PLSDB database. In contrast, only three common plasmids were observed between E. coli and K. pneumoniae by employing the Brooks database. The Neighbor-Joining phylogenetic tree of IncFIB plasmids that were most prevalent for both E. coli and K. pneumoniae is presented in Figure 1B. The tree is based on MAFFT (Multiple Alignment using Fast Fourier Transform) alignment of plasmids conserved regions [28]. There were four IncFIB sequences of K. pneumoniae. Two K. pneumoniae isolates formed a separate branch, whereas two others were clustered together with IncFIB plasmid sequences from E. coli. An overview of the top 20 most abundant putative plasmids (P TRUE ) and replicons retrieved from each database for both E. coli and K. pneumoniae is shown in Table 3.
Regarding plasmid detection in hybrid assembled sequences, we managed to retrieve a higher number of P TRUE from PLSDB than Brooks and PlasmidFinder. All plasmids detected in Brooks for E. coli 39 and K. pneumoniae 23 were also detected in the PLSDB database (Table S2). An overview of the top five P TRUE and replicons retrieved from each database for hybrid assembled isolates can be seen in Table 4.    IncM1  1  p3PCN033  2  pF2_18C_Col  3  IncQ1  1  pMVAST0167_1 2  pDB4277  3  IncX1  1  pKPN-7c3  2  pCERC4  3  p0111  1  pEC732_6  2  pCERC5  3  Table 3. Cont.

Identification of Plasmid-Mediated ARGs
Plasmid assembled files were used to explore the plasmid-mediated ARGs using ResFinder and CARD databases. As can be seen from Table 5 and using plasmid data, regardless of whether identified ARGs located on the putative plasmids (P TRUE ) or not, we identified more ARGs using the CARD database in E. coli isolates compared to Res-Finder. For K. pneumoniae, opposite results were observed. Moreover, several predicted ARGs were different after annotating the results to plasmid databases (for E. coli isolates, ARG Plasmid-PlasmidFinder > ARG Plasmid-PLSDB > ARG Plasmid-Brooks and for K. pneumoniae isolates, ARG Plasmid-PLSDB > ARG Plasmid-PlasmidFinder > ARG Plasmid-Brooks ). Moreover, using hybrid assembled data for E. coli 39 isolates, we detected a higher number of ARGs using the CARD database. The same results for K. pneumoniae 37 were also observed (Table S2). Surprisingly, no ARGs were detected using K. pneumoniae plasmid or hybrid data as ARG Plasmid-Brooks . As is apparent from Table 5, ARG Plasmid-PlasmidFinder in E. coli plasmid data represent a group with the highest detected number of ARGs. Further details about ARG Plasmid-PlasmidFinder can be seen in Figure 2 and Table 5. For E. coli plasmid data, the majority of ARG Plasmid were found on the IncFII plasmid. Furthermore, plasmids such as Col(pHAD28) and IncI1-1(Gamma) hosted the least ARGs. Some ARGs such as aph(3")-Ib, aph(6)-Id, blaTEM-1B, and sul2 were carried by more than one type of plasmid.  PlasmidFinder 31 (51%) 27 (27%) 1 (3%) 1 (6%) 6 (54%) 6 (11%) 14 (58%) 12 (50%) 6 (54%) 5 (29%) Brooks 10 (16%) 10 (10%) 18 (29%) 16 (16%) 3 (10%) 2 (13%) 7 (77%) 7 (13%) 7 (29%) 6 (25%) 6 (54%) 5 (29%) As is apparent from Table 5, ARGPlasmid-PlasmidFinder in E. coli plasmid data represent a group with the highest detected number of ARGs. Further details about ARGPlasmid-Plas-midFinder can be seen in Figure 2 and Table 5. For E. coli plasmid data, the majority of AR-GPlasmid were found on the IncFII plasmid. Furthermore, plasmids such as Col(pHAD28) and IncI1-1(Gamma) hosted the least ARGs. Some ARGs such as aph(3″)-Ib, aph(6)-Id, blaTEM-1B, and sul2 were carried by more than one type of plasmid. The most abundant ARGPlasmid genes for plasmid data from E. coli and K. pneumoniae isolates can be seen in Table 6. The majority of detected ARGPlasmid hits in E. coli isolates, carried by PTRUE from PlasmidFinder and PLSDB, were beta-lactamase gene-variants blaTEM-1B and TEM-1B. For E. coli, ARGPlasmid genes such as aph (3″)-Ib, aph(6)-Id, sul1,   Figure 2. The co-existence of ARG Plasmid genes and different plasmids detected by PlasmidFinder in plasmid data from E. coli isolates. Numbers inside each cell indicating the number of isolates where ARGs were found on the corresponding plasmid.
The most abundant ARG Plasmid genes for plasmid data from E. coli and K. pneumoniae isolates can be seen in Table 6. The majority of detected ARG Plasmid hits in E. coli isolates, carried by P TRUE from PlasmidFinder and PLSDB, were beta-lactamase gene-variants blaTEM-1B and TEM-1B. For E. coli, ARG Plasmid genes such as aph(3")-Ib, aph(6)-Id, sul1, sul2, and tet(D) were flagged as mutual ARG Plasmid , observed in all databases. For K. pneumoniae, no ARG Plasmid gene was detected on P TRUE from Brooks. However, the qnrS1 gene was found as a mutual ARG Plasmid harbored by P TRUE from both PLSDB and PlasmidFinder.
The ARG plasmid prediction using hybrid assembled sequences is presented in Table 7. Overall, ARG plasmid prediction using hybrid assembled sequences was more consistent between databases compared to plasmid assembled data (Table S2). For instance, in the hybrid assembled E. coli 39 isolate, ARG Plasmid such as aac(3)-VIa and aadA1 were hosted by P TRUE from all databases. For K. pneumoniae isolate 37, predicted ARG Plasmid-PlasmidFinder and ARG Plasmid-PLSDB were entirely matched. Regarding K. pneumoniae isolate 23, besides an extra predicted ARG Plasmid-PlasmidFinder , all the predicted ARG Plasmid-PLSDB were covered by ARG Plasmid-PlasmidFinder . Table 6. Gene name and number of isolates with most abundant ARG Plasmid for both E. coli and K. pneumoniae isolates.   (3)

Discussion
In the current research, the applicability of three different plasmid databases and two antibiotics resistance gene databases were assessed using E. coli and K. pneumoniae assemblies taken from Norwegian patients.
We identified a total number of 490 and 52 exclusive putative plasmids using PLSDB and Brooks databases, respectively. Observed differences might be explained by the content of databases, as the method used for developing the databases and the date of last revision (October 2018 for Brooks and November 2020 for PLSDB) as well as their file size (11,677 and 13,789 entries in Brooks and PLSDB, respectively) are different. Although a BLASTn search against Brooks and PLSDB databases resulted in a higher number of putative plasmids than PlasmidFinder, the method has its disadvantages. For instance, using the BLASTn search, we have detected multiple hits with similar lengths, alignment coverage, and percentage identity for the same assigned contig. Therefore, assigning the putative plasmids as P TRUE was challenging. Similar challenges following BLAST+ have been previously described for the FindPlasmid package [29]. Using PlasmidFinder, researchers can directly upload raw files from sequencing platforms. Therefore, de novo assembly is not required, and PlasmidFinder can perform de novo assembly automatically, though the assembly results are not presented by the tool. On the other hand, manual de novo assembly is required in advance to BLASTn search when using other databases such as Brooks and PLSDB. However, one of the disadvantages of using PlasmidFinder is that it currently only covers Enterobacteriaceae and a few Gram-positive bacterial species.
It is clinically relevant to perform downstream analyses such as the prediction of plasmid associated ARGs following plasmid identification. In this study, ResFinder performed better than CARD to predict plasmid associated antibacterial resistance genes (AMR Plasmid-PlasmidFinder, AMR Plasmid-Brooks, and AMR Plasmid-PLSDB ) for both plasmid and hybrid assembled data. In a study comparing the performance of resistance gene databases, both CARD and ResFinder performed equally when submitting a single gene sequence, but CARD performed slightly better for assembled data [30]. Although CARD only accepts FASTA assembly files up to 20 Mb, but in addition to acquired gene information, it contains chromosomal mutation data too. However, ResFinder takes raw files, and assembly is not required. Furthermore, in ResFinder, users can choose between acquired genes or chromosomal mutations. One of the ResFinder advantages is flagging the hit with the true circular term, which indicates whether the hit is plasmid associated or not. Therefore, current data suggest using PlasmidFinder and its associated ResFinder online tools as the first choice to predict plasmid associated ARGs.
In the current study, the AMR Plasmid gene profiles differed between E. coli strains carrying plasmids of the same type. Similar results have been reported for Salmonella entrica isolates in Ghana [31]. This further highlights the mobility of genetic elements between plasmids, resulting in acquiring or losing the ability for antimicrobial resistance. IncF plasmids are known carriers of a broad spectrum of antibiotic resistance genes in E. coli [32][33][34]. In line with this, IncFII plasmids were strongly associated with various resistance genes in our study. These plasmids carried TEM-1B, aph, sul, tetA, and dfr genes conferring resistance to penicillins, aminoglycosides, sulfonamides, tetracyclines, and trimethoprim [35]. IncFIB were the most prevalent plasmids in our dataset, and they exhibited a low association with antibiotic resistance genes. As such, aad (aminoglycoside), sul (sulfonamides), and tet (tetracycline) were located on IncFIB plasmid contig in two cases, highlighting the low, albeit growing, antibiotic resistance in Norway [16]. However, phylogenetic analysis of IncFIB plasmids revealed that plasmid sequences were shared between E. coli and K. pneumoniae, probably indicating its ability for inter-species transfer, which raises a concern over rising antibiotic resistance in Norway [17]. Additionally, in this work, we documented the co-existence of blaCTX−M genes with other genes corresponding to resistance against sulfonamide, aminoglycoside, trimethoprim, and tetracycline. This agrees with previous reports indicating that plasmids harboring blaCTX−M genes frequently also carry other genes encoding resistance to other antimicrobials [36][37][38].
The high sequence error rate in Oxford Nanopore Technologies and incongruity between short/fragmented reads from MiSeq Illumina platform and large repetitive regions in plasmids often results in the inaccurate prediction of plasmid-mediated ARGs. To overcome this issue, hybrid assembly has been suggested [12,39]. In the present research, we found that the prediction of ARG Plasmids following hybrid assembly was more consistent across different databases. Having a less fragmented assembly where the circular plasmids are apparent makes the prediction of ARG plasmid more accurate. The current conclusion regarding the applicability of hybrid assembly for plasmid-mediated ARGs detection previously has been made [40]. Therefore, future work implanting hybrid assembly to identify ARGs in bacteria is worth investigating.

Conclusions
In conclusion, we have demonstrated that plasmid detection and plasmid-mediated ARG prediction are challenging and to obtain a reliable result, one must consider different tools and databases. In the present study, a combination of PlasmidFinder and ResFinder tools showed promising results for both E. coli and K. pneumoniae isolates. Plasmid detection and prediction of plasmid-mediated ARG can be facilitated using hybrid assembly. Although Norway is considered as a country with a low antibiotic resistance frequency, current research provides a reasonable argument to tackle the slightly increasing antibiotic resistance issue in Norway.

Supplementary Materials:
The following are available online at https://www.mdpi.com/2076 -2607/9/1/52/s1, Table S1: an overview of complete results for plasmid assembled data from different databases. Table S2: an overview of complete results for hybrid assembled data from different databases.
Funding: This research was funded by the Norwegian research council, grant number 273609 to AMR-Diag. The APC was funded by the AMR-Diag grant and from support from the Inland Norway University of Applied Sciences.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to confidentiality agreement related to the AMR-Diag project.