A Retrospective Whole-Genome Sequencing Analysis of Carbapenem and Colistin-Resistant Klebsiella pneumoniae Nosocomial Strains Isolated during an MDR Surveillance Program

Multidrug-resistant Klebsiella pneumoniae (MDR Kp), in particular carbapenem-resistant Kp (CR-Kp), has become endemic in Italy, where alarming data have been reported on the spread of colistin-resistant CR-Kp (CRCR-Kp). During the period 2013–2014, 27 CRCR-Kp nosocomial strains were isolated within the Modena University Hospital Policlinico (MUHP) multidrug resistance surveillance program. We retrospectively investigated these isolates by whole-genome sequencing (WGS) analysis of the resistome, virulome, plasmid content, and core single nucleotide polymorphisms (cSNPs) in order to gain insights into their molecular epidemiology. The in silico WGS analysis of the resistome revealed the presence of genes, such as blaKPC, related to the phenotypically detected resistances to carbapenems. Concerning colistin resistance, the plasmidic genes mcr 1–9 were not detected, while known and new genetic variations in mgrB, phoQ, and pmrB were found. The virulome profile revealed the presence of type-3 fimbriae, capsular polysaccharide, and iron acquisition system genes. The detected plasmid replicons were classified as IncFIB(pQil), IncFIB(K), ColRNAI, IncX3, and IncFII(K) types. The cSNPs genotyping was consistent with the multi locus sequence typing (MLST) and with the distribution of mutations related to colistin resistance genes. In a nosocomial drug resistance surveillance program, WGS proved to be a useful tool for elucidating the spread dynamics of CRCR-Kp nosocomial strains and could help to limit their diffusion.


Introduction
Klebsiella pneumoniae (Kp) is a Gram-negative bacterium that can colonize or cause infections in hospitalized patients. Multidrug-resistant (MDR) Kp strains show high-level resistance to β-lactams, aminoglycosides, quinolones, tigecycline, and colistin. In particular, the carbapenem-resistant Kp (CR-Kp) pathogen represents a worldwide challenge due to its high mortality rates. It has become endemic in Italy, where there have been several reports of hospital outbreaks [1][2][3][4][5].
Various carbapenem resistance mechanisms have been identified; however, the mechanism that most frequently occurs is related to Kp carbapenemase (KPC) production [6].
The increasing spread of nosocomial MDR Kp has led to the reintroduction of colistin, which is one of the few widely available therapeutic options for CR-Kp infections [7]. As a consequence of this renewed use, the isolation of colistin-resistant CR-Kp (CRCR-Kp) strains has gradually increased in Italy [8,9].
Prior to 2015, colistin resistance had only been linked to mutational and regulatory changes mediated by chromosomal genes [10]. Gene modifications involved in efflux pump component encoding have also been correlated with colistin resistance [11]. Moreover, plasmid-encoded colistin resistance genes have been reported to be transmissible resistance mechanisms in Enterobacteriaceae. The presence of colistin resistance genes in mobile genetic elements poses a significant public health risk, as these genes can spread rapidly by horizontal transfer and require global monitoring and surveillance [12].
Methods for discriminating and characterizing different Kp isolates are essential to the optimization of infection control resources. Systems based on phenotypes (serotype, biotype, or antibiogram) and molecular methods (multi locus sequence typing (MLST), pulsed-field gel electrophoresis (PFGE), and repetitive extragenic palindromic PCR (rep-PCR)) have been used for many years. Despite the increase in discrimination power from MLST to methods that interrogate the entire genome, such as PFGE and rep-PCR, these techniques may not provide sufficient resolution between strains due to Kp's high clonality [13].
This limitation has been overcome with improvements in sequencing technologies. Whole-genome sequencing (WGS) is positioned to become an essential epidemiological and clinical tool for day-to-day infection control and, for some pathogens, a method for detecting the molecular mechanisms that underlie antibiotic resistance, virulence factors, and plasmid diffusion [14].
At Modena University Hospital Policlinico (MUHP), a 677-bed tertiary care hospital in northern Italy, CR-Kp has been continuously isolated since 2008, and a joint infection control and antimicrobial stewardship program was initiated in 2012. During the period 2013-2014, 27 CRCR-Kp nosocomial strains were isolated and investigated by MLST as part of the surveillance and infection control program.
The objective of this study was to use in silico WGS to retrospectively analyze these strains in order to better understand CRCR-Kp spread dynamics. We used web tools and bioinformatics software to characterize the resistome, virulome, plasmid content, and core single nucleotide polymorphisms (cSNPs). WGS data were then correlated to the clinical-epidemiologic context.
The MLST showed that all samples belonged to Clonal Complex (CC) 258. Twenty-five strains were assigned to sequence type (ST) 512, and two strains were assigned to ST258 (Table 1).

Whole-Genome Sequencing and in Silico Data Analysis
The assembly statistics for each genome are reported in Supplementary Table S2.

Resistome Analysis
The data on antibiotic resistance due to the presence/absence of genes and the mutations that were detected in colistin resistance chromosomal determinants are reported in Figure 1. Resistome data. Resistome data obtained by ResFinder-2.1 software, Pasteur MLST Kp database and running BLAST are here summarized. In particular, in the left box, grey and white colors represent gene presence and absence respectively; in the right box the colistin-resistance related mutations are grouped. The revealed genes are grouped with respect to the related resistance mechanism. Concernig colistin resistance, many different mechanisms have been reported. Prior to 2015, colistin resistance had only been linked to mutational and regulatory changes mediated by chromosomal genes controlling lipopolysaccharide (LPS) modifications. In Kp, the LPS modification is mediated by the activation of different two-component regulatory systems (TCRSs): PmrA/PmrB, PhoP/PhoQ, and The revealed genes are grouped with respect to the related resistance mechanism. Concernig colistin resistance, many different mechanisms have been reported. Prior to 2015, colistin resistance had only been linked to mutational and regulatory changes mediated by chromosomal genes controlling lipopolysaccharide (LPS) modifications. In Kp, the LPS modification is mediated by the activation of different two-component regulatory systems (TCRSs): PmrA/PmrB, PhoP/PhoQ, and CrrA/CrrB. TCRS mutations can cause constitutive expression of the pmrCAB and pmrHFIJKLM operons. Moreover, inactivation of the PhoQ/PhoP negative regulator encoded by mgrB has been suggested to play a prominent role in colistin-resistance in Kp [10] Moreover, plasmid-encoded mcr1-mcr9 genes have been reported as a transmissible resistance mechanism in Enterobacteriaceae [12].
The mcr genes absence and the neutral mutations are not presented in the figure. * New mutations in colistin-resistance related genes detected in this study.
The percentage similarity in the alignment between the best-matching resistance gene in ResFinder and the corresponding sequence in the input genome ranged from 97.14 to 100, with a 100% query/high-scoring segment pair (HSP) length.
Concerning genetic determinants related to colistin resistance, the BLAST results and ResFinder analysis revealed several genetic modifications in chromosomal loci and the absence of the plasmidic genes mcr 1-9, respectively. Compared with the Kp-ST512-K30BO and Kp-HS11286 reference sequences, all isolates showed wild-type acrAB, pmrHFIJKLM, crrA, kpnEF, lpxM, phoP, and pmrACD loci. All samples showed two neutral crrB point mutations (the silent A84C and the missense A887T, corresponding to the neutral amino acid change L296Q in the CrrB protein) and two silent pmrD point mutations (T162C and C195T) with respect to the ST11-HS11286 reference sequence.
As shown in Figure 1, significant new alterations were found in mgrB, phoQ, and pmrB. In particular, we found seven new mutations, including a 10-nucleotide deletion in mgrB, a point mutation in the mgrB promoter, an insertion in phoQ, and two point mutations in both phoQ and pmrB.
These new variants were defined by a literature search and a BLAST search, and were found to not match the Kp gene sequences/genomes in the GenBank database. Three different types of non-silent mgrB alterations (deletions, nonsense mutations, and an insertional inactivation) were detected in 13 of 27 isolates. Eight isolates (KpMO1, KpMO14, KpMO20-23, KpMO26, and KpMO27) exhibited a new 10-nucleotide (nt) deletion (∆nt61/70). This deletion causes a frame shift with consequent double amino acid (aa) substitutions (T21L and Q22T) and the production of a 22-aa, C-terminal truncated, and most likely nonfunctional MgrB protein. At the level of the mgrB promoter, a new point deletion, −55∆G, was detected in KpMO7.
The pmrB gene was found to have two new missense mutations (T137A (V46E) in KpMO7 and C284T (P95L) in KpMO14). In KpMO4, no mutation that could explain the colistin resistance was found.
PROVEAN analysis predicted a deleterious impact on the biological protein function of all of the new mutations except one (C1369G in the phoQ gene) ( Table 2). Apart from the new mutations in colistin-related genes, we found some well-known mgrB resistance mechanisms, namely the C88T point mutation, a complete lack of the gene, and an insertional inactivation.
Three samples (KpMO2, KpMO8, and KpMO10) showed the previously known C88T mutation, which generates a premature stop codon and produces a truncated, nonfunctional, and 29-aa-long protein. Moreover, KpMO17 exhibited a 1879-nt mgrB genetic environment deletion, including a large region upstream (part of the gene encoding the major facilitator superfamily protein and the kdgR and yobH genes) and within the mgrB gene (the mgrB promoter and the first mgrB 132/144 coding nt, the ∆ locus).
Finally, in KpMO9, mgrB was found to be completely disrupted by a 1196-nt insertion sequence (IS5-like) at the level of the 74-75 nt positions (best match with the AO-1367 Kp strain, accession No. KP967591.1) [15].

Virulome Analysis
The virulence repertoire was represented by type-3 fimbriae (the mrk operon), capsular polysaccharides (cps cluster genes associated with the K type (K) and the K locus (KL)), and iron acquisition systems (the fyu, irp, and ybt genes) ( Figure 2).
Iron acquisition genes were found in two samples only. A complete yersiniabactin siderophore system (fyuA, irp1, irp2, and the ybt operon) was found in KpMO28, while an incomplete ybt cluster (∆ybtU) was detected in KpMO20.

Plasmid Content Analysis
Five replicon types were globally detected and characterized as IncFII(K), IncFIB(K), ColRNAI, IncX3, and IncFIB(pQil). The percentage similarity in the alignment between the best-matching plasmid in the PlasmidFinder database and the corresponding sequence in the input genome ranged from 97.97 to 100. IncFII(K), IncFIB(K), and ColRNAI replicons were identified in all isolates, while IncFIB(pQil) was not found in KpMO7. The absence of IncX3 was common to KpMO2, KpMO8, and KpMO10 ( Figure 2).

Phylogenetic Analysis
We drew a maximum likelihood cSNPs tree of the 27 samples in order to provide high-resolution strain tracking and discrimination. In total, we identified 1731 SNPs, of which 1002 were shared by all samples.
The cSNPs analysis resulted in two major lineages corresponding to the ST512 and the ST258 samples. The ST512 lineage included two major clusters, A (n = 4) and B (n = 21). The B cluster was grouped into two minor branches, B1 (n = 2) and B2 (n = 19), with the latter consisting of two subgroups, B2a (n = 13) and B2b (n = 6) ( Figure 3).  Figure 2. Virulome data and plasmid content. Virulome data were obtained by Pasteur MLST Kp database; plasmid content data were obtained by PlasmidFinder-1.3. Grey and white colors represent gene presence and absence respectively. ND, Not Defined.

Phylogenetic Analysis
We drew a maximum likelihood cSNPs tree of the 27 samples in order to provide high-resolution strain tracking and discrimination. In total, we identified 1731 SNPs, of which 1002 were shared by all samples.
The cSNPs analysis resulted in two major lineages corresponding to the ST512 and the ST258 samples. The ST512 lineage included two major clusters, A (n = 4) and B (n = 21). The B cluster was grouped into two minor branches, B1 (n = 2) and B2 (n = 19), with the latter consisting of two subgroups, B2a (n = 13) and B2b (n = 6) ( Figure 3).  . Core SNPs analysis data incorporated into the epidemiological metadata. Core SNPs tree (left) shows two major lineages corresponding to the ST512 (blue) and the ST258 (red). Clusters within ST512 lineage are indicated by capital letters (A, B) followed by a number in the minor branches (B1, B2) and by additional lowercase letters in the subgroups (B2a, B2b). The right box represents patients' movements inside the hospital, each color represents a ward. ICU, red; Infectious Disease, cyan, Long Term Care, pink; Medicine II, yellow; Medicine ICU, blue; Nephrology, dark green; Oncology, brown; Orthopedics, light green; Otolaryngology, orange; Pneumology, grey; Transplant Unit, fuchsia. ICU, Intensive Care Unit; ∆, deletion; IS, Insertion Sequence; ins, insertion; wt, wild type. * days 15-31.

The Molecular Data in the Clinical-Epidemiologic Context
The phylogenetic data, MLST results, and mutations detected in colistin resistance genes were incorporated into the epidemiological metadata in order to elucidate the routes of transmission ( Figure 3).
The ST512/cluster A isolates were closely related and shared the same colistin resistance mechanism and plasmid content. The ST512/cluster B isolates were characterized by a high level of molecular and epidemiological heterogeneity, although many of them shared the same colistin-resistance-related mutation. The two ST258 isolates were unrelated to all of the other strains, and each strain had a unique colistin-resistance-related mutation.

Discussion
Our retrospective investigation of 27 CRCR-Kp, isolated at Modena University Hospital Policlinico, by WGS analysis of the resistome, virulome, plasmid content, and cSNPs allowed us to obtain two main results: the identification of new genetic variations in the colistin-resistance related genes and an informative epidemiological picture of the spread of CRCR-Kp in our clinical setting.
The MLST analysis showed that all isolates belonged to the pandemic CC258. These data confirmed the pandemic CC258 s global distribution and its ability to cause hospital outbreaks and disseminate carbapenemase genes [1,6,16]. Nevertheless, the MLST results did not allow us to identify specific relationships or possible patterns of transmission among the studied strains. On the other hand, the WGS data provided us with an adequate amount of information about the evolution of CRCR-Kp circulation in our hospital.
Concerning WGS resistome analysis, in all samples, genes coding for the main classes of antibiotic resistance were found to be in agreement with the susceptibility test.
Regarding genetic determinants related to colistin resistance, we confirmed, as reported by other authors, the absence of the plasmidic genes mcr 1-9 [18]. Instead, both known and significant new genetic modifications were identified in chromosomal loci, specifically in the mgrB, phoQ, and pmrB genes. We identified some well-known chromosomal colistin resistance mechanisms involving mgrB, such as a C88T point mutation [10], a complete lack of the gene [10,18,19], and an insertional inactivation [10,15,19,20].
For all new mutations except one (C1369G in the phoQ gene), the PROVEAN analysis predicted a deleterious impact on the biological protein function, suggesting a possible association between each of these mutations and colistin resistance. This in silico analysis allowed us to overcome one limitation of our study, namely the absence of the trans-complementation tests and evaluation of the mgrB, pmrB, and phoQ expression levels that are commonly used to confirm an association between a new mutation and colistin resistance.
The results of the virulence analysis of the studied samples showed the presence of type-3 fimbriae known to be involved in biofilm formation on biotic and abiotic surfaces of medical devices in a hospital environment [21]. The absence of capsule synthesis loci associated with hypervirulent serotypes (i.e., serotypes K1, K2, and K5) indicates that MDR strains do not currently overlap with hypervirulent CCs [22]. Moreover, we confirmed that KL107 and KL106 were associated with wzi/wzc154/916 and wzi/wzc29/921, respectively [23].
Plasmids, which are considered to be the primary source of Kp gene variability, have been used as molecular markers in epidemiological investigations [24]. However, the homogeneity of our results limits the use of plasmid content as an epidemiological marker, except for KpMO2, KpMO8, and KpMO10, which were found to be defective for IncX3, and KpMO7, which was found to be defective for IncFIB(pQil).
Moreover, the restrictions intrinsic to short-read technologies (e.g., Illumina) in the WGS approach did not allow us to accurately reconstruct the genomic context surrounding the repeated sequences in the plasmids [25]. Nevertheless, the variability in plasmid content found in our collection may provide support to the hypothesis that plasmidic exchanges and arrangements can occur in endemic healthcare settings, generating additional plasmid types.
The incorporation of MLST and WGS data into the clinical-epidemiologic context allowed us to draw some inferences. As CRCR-Kp can spread via person-to-person contact or environmental sources, the WGS analysis allowed us to identify possible transmission patterns.
For example, the ST512/cluster A isolates, which were found to be closely related, could have derived from a common source that was confined to the nephrology ward. Indeed, the KpMO8 strain, which was the first to be isolated from the nephrology ward (January 2013), was found to belong to ST512/cluster A and carry the C88T mutation in the mgrB gene. In the following months, the ST512/cluster A isolates evolved in the same ward as KpMO10 and KpMO2, which were found to be closely related to each other, and shared with KpMO8 the same colistin resistance mechanism and plasmid content. This result was confirmed by the characterization of plasmid content with a unique profile in which IncX3 was absent. The KpMO9 isolate, which first appeared in the intensive care unit (ICU), slightly diverged from the other ST512/cluster A isolates, showing an mgrB insertional inactivation. Moreover, it had the same plasmid content as all other isolates except for KpMO7. This latter isolate was the only one found to be defective for IncFIB(pQil). Therefore, WGS allowed us to differentiate KpMO9 from KpMO8, KpMO10, and KpMO2, while MLST grouped them together.
The ST512/cluster B isolates were characterized by a high level of molecular and epidemiological heterogeneity.
KpMO17 and KpMO3 were isolated from different wards, and although both belonged to B1, they did not share the same variation in colistin resistance determinants.
Ten strains, isolated in five wards during the period November 2013 to January 2014, were characterized by a phoQ 799/801(GAC) insertion and belonged to the B2a subgroup. Eight strains, isolated from six wards, carried the mgrB-∆nt61/70 deletion and were isolated over a long period of time (January 2013 to November 2013). Six of these isolates belong to the B2b subgroup, while two isolates (KpMO1 and KpMO23) belong to the B2a subgroup. In the larger ST512/cluster B, the isolates' molecular heterogeneity did not allow us to confirm the hypothesis of transmission or of exposure to the same hospital source of CRCR-Kp. In particular, the closely related KpMO19/KpMO5 strains were isolated from the ICU at the same time, the KpMO25/KpMO29/KpMO31 strains were isolated from Medicine II, and the KpMO20/KpMO27/KpMO22 strains were isolated from the infectious disease ward.

Study Design
From January 2013 to March 2014, 85 non-duplicated CRCR-Kp strains were isolated from several wards within the MDR surveillance program by means of universal patient rectal swab screening at admission (t = 0) and thereafter at regular intervals (weekly) during hospitalization. The isolates were immediately subjected to both routine phenotypic and traditional genotyping analysis (MLST). Of these 85 strains, 27 strains that were isolated from inpatient rectal screening swabs or clinical samples obtained during hospitalization (starting 48-72 h after admission) were selected and used for a retrospective WGS analysis in order to gain insights into the molecular epidemiology of nosocomial CRCR-Kp strains.
For each patient, we selected the first obtainable CRCR-Kp isolate, favoring more relevant clinical samples (urine, blood, etc.) where available. Table 3 contains information on the isolates and each patient's characteristics; each isolate number corresponds to a particular case.  This study was approved by Modena's provincial ethics committee and registered with protocol no. 2655, 21/July/2016. No written informed consent was obtained from patients as all data were analyzed anonymously after a de-identification process.

Antimicrobial Susceptibility, Carbapenemase Phenotype Detection and MLST
At the time of strain isolation (2013-2014), species identification and antimicrobial susceptibility were determined using the Vitek2 automated system (BioMérieux, Marcy l'Etoile, France). The susceptibility test results and minimum inhibitory concentration (MIC) were interpreted according to the 2012 EUCAST breakpoints criteria [26]. Susceptibility to colistin and susceptibility to tigecycline were verified using the E-test (BioMérieux). Carbapenemase phenotype testing was performed using the KPC+MBL Confirm ID kit (Rosco Diagnostica A/S, Taastrup, Denmark).
To assign sequence types, an MLST analysis was performed using the Pasteur database scheme available for K. pneumoniae (http://bigsdb.pasteur.fr/klebsiella/klebsiella.html).

Whole-Genome Sequencing and in Silico Data Analysis
Genomic DNA for molecular analysis was extracted from the 27 CRCR-Kp isolates using the Maxwell-16 automated DNA/RNA extraction system (Promega, Madison, WI, USA). DNA quality, quantity, and purity were determined using agarose gel, a NanoDrop 8000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA), and an E6150 Quantus™ Fluorometer (Promega).
The samples were fully sequenced by using a next-generation sequencing (NGS) approach on the MiSeq platform (Illumina, San Diego, CA, USA). The Nextera XT DNA protocol was applied with 1-1.5 ng of starting DNA, and sequencing was performed with a v3 kit (600 cycles). Paired-end reads were demultiplexed into separate samples, quality checked by removing adapter sequences and bases (quality score <25) using the FastQC (Babraham Research institute, Cambridge, UK) and Sickle software (https://github.com/najoshi/sickle), and de novo assembled into contigs using the Abyss-pe v1.5.2 program (Canada's Michael Smith Genome Sciences Centre, Vancouver, Canada) (k parameter = 63) or the SPAdes v3.7.0 program (Center for Algorithmic Biotechnology St. Petersburg State University, St. Petersburg, Russia) [27,28]. Contigs longer than 500 bp were selected using an ad hoc script and kept for further analysis.
The 27 nucleotide sequences were deposited at DDBJ/ENA/GenBank under Bioproject ID PRJNA504600.

Phylogenetic Analysis
To establish genetic relatedness among the 27 CRCR-Kp isolates, SNP discovery was performed using the kSNP v3.0 program (Bellingham Research Institute. Bellingham, WA, USA (k-mer = 21). SNP loci were defined by an oligo of k length surrounding a central SNP allele [30]. The maximum likelihood tree based on the cSNPs detected in all genomes was visualized with the Dendroscope v3.2.10 software (Center for Bioinformatics, Tübingen, Germany) [31].

The Molecular Data in the Clinical-Epidemiologic Context
The results obtained from both the MLST and WGS-based analyses were incorporated into the clinical-epidemiological metadata that were collected from the patients' medical records.

Conclusions
Due to the growing importance of MDR Kp, a fast and accurate identification and typing of pathogens is essential for effective surveillance and outbreak detection. We need to know the genetic arrangement related to the antibiotic resistance, to understand population structure in hospital settings, and their relationship.
The most important result of this retrospective WGS analysis was the discovery of new genetic variations involving the mgrB, phoQ, and pmrB genes related to colistin resistance and the absence of the plasmidic mcr gene.
Moreover, this study proved to be useful to a program for monitoring the spread of nosocomial CRCR-Kp strains, and allowed us to confirm what has already been shown in many other studies [32][33][34][35][36]. In particular, the distribution of mutations in colistin resistance determinants was consistent with the cSNPs clustering. Thus, it may serve as a good epidemiological marker. As WGS was more discriminating than MLST, it allowed us to identify possible CRCR-Kp transmission patterns and to obtain a clustering reconstruction that was more consistent with the epidemiological analysis.
WGS produces results with an excellent cost/benefit ratio and a mean measured turnaround time (TAT) of 4.4 days comparable to TAT required for investigations with less discriminatory methodologies [33]. However, WGS-informed outbreak tracking is still usually performed only retrospectively [37].
In our view, the data obtained retrospectively in this study, if available in real-time, could have helped to surveil the alert nosocomial pathogens by directing the infection control team to focus its attention and resources on those wards where WGS would have highlighted transmission events. Furthermore, the future routine clinical implementation in our hospital of real-time WGS would provide the infection control team with reliable and timely data about the emergence and spread of antimicrobial resistance, and hopefully the ability to prevent outbreaks by the rapid application of infection control procedures and the implementation of a targeted antimicrobial stewardship program.
Finally, even if our findings have allowed us to better understand the spread of Kp in hospitals at a local level, they add to the molecular CRCR-Kp epidemiology data at both the national and international levels and contribute to defining a framework for the epidemiology of this pathogen.