The Impact of Insertion Sequences on O-Serotype Phenotype and Its O-Locus-Based Prediction in Klebsiella pneumoniae O2 and O1

Klebsiella pneumoniae is a nosocomial pathogen, pointed out by the World Helth Organisation (WHO) as “critical” regarding the highly limited options of treatment. Lipopolysaccharide (LPS, O-antigen) and capsular polysaccharide (K-antigen) are its virulence factors and surface antigens, determining O- and K-serotypes and encoded by O- or K-loci. They are promising targets for antibody-based therapies (vaccines and passive immunization) as an alternative to antibiotics. To make such immunotherapy effective, knowledge about O/K-antigen structures, drift, and distribution among clinical isolates is needed. At present, the structural analysis of O-antigens is efficiently supported by bioinformatics. O- and K-loci-based genotyping by polymerase chain reaction (PCR) or whole genome sequencing WGS has been proposed as a diagnostic tool, including the Kaptive tool available in the public domain. We analyzed discrepancies for O2 serotyping between Kaptive-based predictions (O2 variant 2 serotype) and the actual phenotype (O2 variant 1) for two K. pneumoniae clinical isolates. Identified length discrepancies from the reference O-locus resulted from insertion sequences (ISs) within rfb regions of the O-loci. In silico analysis of 8130 O1 and O2 genomes available in public databases indicated a broader distribution of ISs in rfbs that may influence the actual O-antigen structure. Our results show that current high-throughput genotyping algorithms need to be further refined to consider the effects of ISs on the LPS O-serotype.


Introduction
Klebsiella pneumoniae is a Gram-negative bacterium which is part of the human microbiota; however, it is also a frequent cause of nosocomial and community-acquired infections in newborns, the elderly, and immunocompromised patients [1][2][3][4][5][6]. K. pneumoniae belongs to the ESKAPE group of pathogens (ESCAPE is an acronym for Enterococcus faecium, Staphylococcus aureus, K. pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, Enterobacter spp.) [2,7] and to the top priority list of In this paper, we describe two clinical isolates of K. pneumoniae (strains BIDMC 7B and ABC152), in which Kaptive-based O-serotype prediction and O-antigen structural analysis reveal different Oserotypes. Molecular characterization was performed to explain the genotype-phenotype discrepancies as a result of insertion sequences (ISs) within their rfb regions. Further, large-scale in In this paper, we describe two clinical isolates of K. pneumoniae (strains BIDMC 7B and ABC152), in which Kaptive-based O-serotype prediction and O-antigen structural analysis reveal different O-serotypes. Molecular characterization was performed to explain the genotype-phenotype discrepancies as a result of insertion sequences (ISs) within their rfb regions. Further, large-scale in silico analysis of 8130 K. pneumoniae genomes available in public databases was performed, in order to assess the prevalence of such insertions in rfb of K. pneumoniae O1 and O2 genomes.

O-Antigen Structures of the BIDMC 7B and ABC152 Strains Represent the O2 Variant 1 O-Serotype
The O-antigen chemical structures of the K. pneumoniae BIDMC 7B and ABC152 clinical isolates were characterized by nuclear magnetic resonance (NMR) spectroscopy on the native isolated LPS. LPS was extracted from bacteria with yields of 0.32% and 0.70% for BIDMC 7B and ABC152, respectively, and then analyzed by the high-resolution magic angle spinning (HR-MAS) 1 H, 13 C NMR spectroscopy (Figure 2a, silico analysis of 8130 K. pneumoniae genomes available in public databases was performed, in order to assess the prevalence of such insertions in rfb of K. pneumoniae O1 and O2 genomes.

O-Antigen Structures of the BIDMC 7B and ABC152 Strains Represent the O2 Variant 1 O-Serotype
The O-antigen chemical structures of the K. pneumoniae BIDMC 7B and ABC152 clinical isolates were characterized by nuclear magnetic resonance (NMR) spectroscopy on the native isolated LPS. LPS was extracted from bacteria with yields of 0.32% and 0.70% for BIDMC 7B and ABC152, respectively, and then analyzed by the high-resolution magic angle spinning (HR-MAS) 1 H, 13 C NMR spectroscopy (Figure 2a,b,e). silico analysis of 8130 K. pneumoniae genomes available in public databases was performed, in order to assess the prevalence of such insertions in rfb of K. pneumoniae O1 and O2 genomes.

O-Antigen Structures of the BIDMC 7B and ABC152 Strains Represent the O2 Variant 1 O-Serotype
The O-antigen chemical structures of the K. pneumoniae BIDMC 7B and ABC152 clinical isolates were characterized by nuclear magnetic resonance (NMR) spectroscopy on the native isolated LPS. LPS was extracted from bacteria with yields of 0.32% and 0.70% for BIDMC 7B and ABC152, respectively, and then analyzed by the high-resolution magic angle spinning (HR-MAS) 1 H, 13 C NMR spectroscopy (Figure 2a,b,e).    Table 1). The same D-galactan I structure was determined for the ABC152 LPS ( Figure 2b; Table 1). The lack of a terminal α-D-Galp residue, characteristic for the O2v2 serotype ( Figure 2d, C1 signal), was confirmed for both isolates. Their NMR spectra were comparable to those recorded for the K. pneumoniae Kp26 O-PS (O2v1 serotype). The comparison of 1 H, 13 C HSQC-DEPT spectra of BIDMC 7B ( Figure 2e) and ABC152 showed a complete overlay, confirming the identity of these O-PSs (Table 1).

Disruption of the gmlB Gene by IS Affects the O-Antigen Phenotype in BIDMC 7B and ABC152
Kaptive-based O-serotyping was performed with whole-genome sequences of both strains [14]. Contrary to the structural analysis, the O-serotypes were predicted to be O2v2 with high match confidence, according to the Kaptive measures of match quality. However, the BIDMC 7B and ABC152 rfb clusters demonstrated an increased size (by 777 bp) when compared to the reference sequences in the Klebsiella O-locus primary reference database in Kaptive. The gmlABC genes showed 100%, 90.79%, and 97.33% identity, respectively, to those in the Kaptive reference database.
Molecular analysis of BIDMC 7B and ABC152 was performed to explain the discrepancies observed between the O-antigen phenotype and the Kaptive-predicted O-serotype. The alignment of the gmlB genes from BIDMC 7B, ABC152, and from the O1/O2v2 reference strains K. pneumoniae NTUH-K2044 and 441 are shown in Figure 1b. This comparison shows the disruption of gmlBs in both strains by an identical IS element, ISR1, whereas other genes in the rfb locus were intact, in comparison to the reference strains. These results indicated that the ISR1 disruption completely inactivated the GmlB glycosyltransferase gene, resulting in biosynthesis of the O2v1 instead of O2v2 structure, and thus, being the likely reason for the discrepancy between the actual O-antigen phenotype and Kaptive-based predictions.

ISs Occur in O2v2 and O1v2 K. pneumoniae Isolates-in Silico Study
In order to assess the occurrence of IS elements in rfb loci of K. pneumoniae, 8130 genome sequences available in the public domain were analyzed (Supplementary Material Table S1). Based on the Kaptive results of the O-serotyping (Supplementary Material Table S1, column B), 2281 isolates (≈ 28%) were predicted to be O2v2, and 839 isolates (≈ 10%) to be O1v2. For O2v2, 55 genomes (≈ 2.40%) revealed a significant difference in length of the rfb region (≥ 400 bp), of which 49 genomes were of sufficient quality for further analysis (Supplementary Material Table S2). The presence of different ISs (e.g., ISR1, IS903B, ISKpn14, or ISKpn26) were identified in several genes of these loci; namely, gmlBC, kfoC, wbbMNO, glf, wzm, and wzt (Table 2). In several isolates, the same or two different ISs interrupted two genes; namely, gmlB or gmlC and wbbO, or wbbM and wbbO. Among the 839 Kaptive-identified O1v2 isolates, significant length discrepancies (≥700 bp) occurred in the rfb region of six isolates (≈0.7%), one of which was excluded due to the low quality of reads (Supplementary Material Table S3). Selected rfb genes of these isolates, namely gmlABC, wbbM, and wzm, were interrupted by IS5, IS102, IS903B, or ISKpn14 (Table 3). Two and one O1v2 isolates, two and four genes, respectively, were disrupted simultaneously. In both the O2v2 and O1v2 groups, the same ISs were observed at the same positions of the same genes in several isolates, suggesting their close genetic relatedness. The gmlB:ISR1 (nt 818) disruption of the studied isolates BIDMC 7B and ABC152 was found in four other genomes. In order to sort out the approximate number of independent IS insertions into the rfb loci of the available K. pneumoniae O2v2 and O2v1 genomes, clonality (MLST) and phylogenetic analyses were performed on the isolates using the ABC152 strain as a reference (Supplementary Material Tables S2 and S3; Figure 3). These confirmed that some individual disruptions within the rfb locus have spread in K. pneumoniae populations clonally with specific lineages, indicating single IS insertion events at their origins. This was demonstrated by clusters of O2v2 ST258 isolates with kfoC:ISR1 (nt 656) or wbbM:ISR1 (nt 1,881) disruptions, ST258 with double gmlB:ISKpn26 (nt 453) plus wbbO:ISKpn26 (nt 490) disruptions, or ST34 with gmlC:IS903B (nt 7,956). In some cases, an additional IS insertion likely marked on-going diversification within a lineage, such as wbbO:IS903B (nt 130) in ST34 with gmlC:IS903B (nt 7,956). An interesting case was the gmlB:ISR1 (nt 818) disruption in the study isolates BIDMC 7B and ABC152, which was observed also in four others. BIDMC 7B plus the four others formed a closely related cluster of ST258 isolates. ABC152 was of a non-related ST147, suggesting a horizontal transfer and recombination event. A similar case was represented by the disruption wbbO:ISKpn26 (nt 1,014), present in two ST258 and ST512 close relatives, as well as a non-related ST17 isolate. Based on these results, it may be assumed that IS disruptions within the rfb loci in K. pneumoniae O2v2 and O2v1 genomes might have occurred at least ≈35 and ≈10 times, respectively (Tables 2 and 3).  Table S2. Separately analyzed isolates are not colored.

Discussion
Owing to their universality, reproducibility, varied resolution, and standardized high-throughput protocols, molecular biology methods have become an excellent tool for pathogen characterization, finding wide application in microbiology diagnostics and surveillance. In recent years, these have been revolutionized by WGS, an increasingly common approach used in public health laboratories for the control of antimicrobial resistance or bacterial genotyping. At present, WGS is also successfully used to complement laborious structural chemical analyses, such as those of microbial surface antigens, being key pathogenicity factors as well as critical targets for vaccines and therapeutic strategies [27][28][29].
As the molecular genetics of the K. pneumoniae O-and K-antigen biosynthesis has been well-elucidated, new O-genotyping techniques have been demonstrated to be useful for serotyping. There are several useful examples of tracking O-or K-antigen diversity among K. pneumoniae isolates [1,14,24]. For example, Fang et al. used a PCR-based O-genotyping approach to explore the distribution of the O-antigen genetic determinants in 87 clinical K. pneumoniae strains, showing a high prevalence of O1 (≈ 57%), followed by the O2a, O3, and O5 O-genotypes [24]. Follador et al. analyzed over 500 whole-genome sequences and reached a similar conclusion: that O1, O2, and O3 serotypes were the most common, with approximately 80% of all isolates [1]. Finally, Wick et al. presented the user-friendly Kaptive Web, an online tool for the rapid typing of K. pneumoniae surface polysaccharide loci, and demonstrated its utility using more than 500 K. pneumoniae genomes [14].
Kaptive Web-supported differentiation between K. pneumoniae O1/O2v1 and O1/O2v2 serotypes based on two steps: First, Kaptive recognizes the serotype by searching for the D-galactan-II-encoding genes (wbbY, wbbZ) characteristics of the O1 serotype. Second, the O1 and O2 serotypes are distinguished by the analysis of genes found in the rfb cluster. Finally, these are reported as variant v1 or v2 [14]. As the final result, the tool prediction is accompanied by length discrepancy information, which may indicate the possibility of some rearrangements in the rfb region.
In this study, we presented two cases of genotype-phenotype discrepancies for O-antigens in the K. pneumoniae clinical isolates BIDMC 7B and ABC152, the actual phenotype of which was O2v1, whereas Kaptive predicted O2v2. However, the tool provided an alert about "length disruption" within the rfb region and recommended further analyses. In the case of BIDMC 7B and ABC152 isolates, the structural analysis by the HR-MAS NMR spectroscopy proved the O-antigen structure to be O2v1 (Figure 2). The ISR1 element was identified in the gmlB gene, one of the three responsible for the D-galactan I conversion from v1 to v2. The presence of IS, actual O-antigen structures, and the lack of other obvious differences between the analyzed genomes and O2v1 reference strains indicated that the IS disruption was the reason for the discrepancy between the O-antigen phenotype and the Kaptive-based prediction. The large-scale in silico analysis of publicly available genomes of K. pneumoniae O2v2 and O1v2 clearly showed that various insertions have occurred in several rfb fragments, possibly causing similar divergences between the O-serotype prediction and phenotype. A variety of ISs have been identified, including the common elements ISR1, ISKpn14, ISKpn26, and IS903B.
As only structural verification in each strain could provide definite proof of the O-phenotype, the in silico survey only suggested the influence of IS on the O-antigen chemical structure. By analogy with the BIDMC 7B and ABC152 strains, fourteen O2 strains (e.g., ASM170423, CHS57, and IS39) revealed ISs in the gmlABC region with higher prevalence of gmlB and gmlC disruptions, likely representing similar genotype-phenotype discrepancies. Three O2 strains with an IS in the gmlABC region had additional disruption within the wzm-wbbO region, suggesting failure of the O-antigen biosynthesis and the rough form of LPS, devoid of O-PS (i.e., ASM307130, UCI 38, and BIDMC 13). Other identified cases also suggested O-antigen biosynthesis failure, including O1v2 isolates (Tables 2 and 3). Regarding the genetic background of O1 and O2 antigen biosynthesis, the presence of an IS in the O-locus may influence the O-antigen phenotype by: (i) O2v2 to O2v1 or O1v2 to O1v1 conversion; or (ii) conversion from smooth to rough LPS. It is noteworthy that the results obtained from Kaptive Web depended on the IS location. In the case of gene disruption or frameshift mutation, the results will indicate the lack of an enzyme specific for the analyzed serotype, which may contribute to the false serotype prediction by the algorithm. For instance, the Kaptive results for the isolate IS39 indicated the absence of the wbbM and gmlABC genes (Table S2). Detailed analysis of the rfb region has shown the presence of these genes with a gmlC IS disruption and point mutations in the other three genes. Although Kaptive suggests the possibility of the presence of IS by reporting differences in length discrepancy, it is worth analyzing the nucleotide sequence of the rfb region more precisely, in order to exclude falsely predicted serotypes based on errors occurring during O-genotyping.
Sequence analysis of isolates from the database confirmed the occurrence of many IS insertions in the rfb region; however, the frequency of such events is hard to evaluate. Although there are no previous data on IS disruptions within the O-locus, in general, these elements are common in K. pneumoniae genomes [30]. The hyperepidemic clone ST258, characterized by the notorious production of KPC-type carbapenemases and extensive drug resistance, has more ISs than an average K. pneumoniae isolate of another ST [30,31]. Several IS types are especially frequent in ST258, such as ISKpn26 [30,32]. In our study, the majority of O2 isolates identified belonged to ST258 (≈70%), and ISKpn26 was commonly found in these (≈50%). These data further emphasized the impact of ISs on the evolution of K. pneumoniae ST258; however, one must also consider the over-representation of ST258 genomes in public databases, resulting from the high clinical and epidemiological relevance of these organisms.
According to Adams et al., 94% of K. pneumoniae strains have at least one IS in their genome, where transposition of these elements within the genome causes rearrangements and may create new genotypes [30]. One consequence of an IS disruption of the rfb cluster genes may be the protection of bacteria against the host immune system. Although the presence of ISs in gmlABC genes may cause the phenotype-genotype discrepancy discussed above, the disruption of the genes determining D-galactan I elements may abolish the O-antigen synthesis, as in the case of the wzm or wzt genes, coding for ABC transporters [1,16,25]. IS elements in the rfb and/or wbbYZ operon can either inhibit the expression of the O-antigen on the surface of the bacterial cell [24] (resulting in the rough phenotype) or cause the switch from one serotype to another. In both cases, the change in phenotype can alter or impair the virulence of the bacteria [33]. Structural large-scale analysis of K. pneumoniae isolates could determine the consequences of ISs in the rfb region and their effects on bacterial antigenicity and host interactions. Such changes can significantly affect the ability of bacteria to survive during antibacterial therapies; for example, by changing the surface antigens and virulence in regard to reactivity with the complement, antibodies, or phage resistance [34]. The antigenic drift of LPS can be a way by which bacteria avoid the immune system. This is the case, for example, in the Salmonella species, the O-antigen composition of which affects the host-pathogen interactions during infection. Strains belonging to one serovar can have a different repertoire of O-antigen-modifying genes. Moreover, their expression is different depending on the phase variations. The gtrABC operon acquired by horizontal gene transfer is such a set of genes for modification of the Salmonella O-antigen. These genes encode proteins showing functional homology to the glycosyltransferases encoding by the gmlABC genes cluster in K. pneumoniae [35,36].
This study showed that some K. pneumoniae isolates, flagged by Kaptive Web as having length discrepancy within the rfb locus and advised for further analysis, may be basically mis-O-serotyped by the tool. As a methodology for the identification of an actual O-antigen phenotype is not broadly available, we assume that the development of the Kaptive algorithm in that direction would increase its high quality and usefulness, particularly for inexperienced users. Such development might be based on broader studies of isolates with non-clear O-serotyping results or identified O-genotype-phenotype discrepancies, with the use of structural analysis to precisely elucidate the O-genotype-phenotype relationships.
As O-and K-antigens represent target molecules for therapeutic strategies against Klebsiella infections, it is important to broaden our knowledge about genotype-phenotype relationships. Filling all detected gaps will improve serotype predictions based on bioinformatic tools. Exact fast prediction will enable the monitoring of K. pneumoniae antigen drift, which is vulnerable to selective pressure by therapies and vaccines.

Bacteria and Growth Conditions
K. pneumoniae BIDMC 7B (urine isolate) was obtained through BEI Resources, NIAID, NIH: "Klebsiella pneumoniae, strain BIDMC 7B, NR-41923", as a reagent bought as a part of the Klebsicure-Eurostars project (no. E!7563). Strain ABC152 (urine isolate) was recovered from the Abu Dhabi Hospital (UAE) in 2013, kindly provided by Agnes Pal-Sonnevend and Tibor Pal from the United Arab Emirates University. Both strains were selected for a previous large-scale serotyping study (unpublished results), due to the inconsistency between the lack of LPS reactivity with O2v2-specific monoclonal antibody [21] and PCR results showing the presence of gmlABC genes. Bacteria were grown on Trypcase Soy Agar plates. For semi-preparative scale LPS preparation, the strains were cultured in Luria-Bertani (LB) broth in 500 mL flasks with shaking (at 37 • C), inactivated overnight with 3% formalin at 22 • C, then harvested by centrifugation, washed with water, and freeze-dried.

O-Specific Polysaccharides
K. pneumoniae PCM-27 and Kp26 O-specific polysaccharides were obtained from the Laboratory of Microbial Immunochemistry and Vaccines in the Ludwik Hirszfeld Institute of Immunology and Experimental Therapy, PAS (Wroclaw, Poland) and isolated as previously described [21].

LPS Preparation
LPS of K. pneumoniae strains BIDMC 7B and ABC152 were isolated by hot phenol/water extraction [37] and purified by dialysis and ultracentrifugation, as described elsewhere [38], followed by freeze-drying.

NMR Spectroscopy
All NMR spectra were obtained at 298 K using an Avance III 600 MHz (Bruker BioSpin GmbH, Rheinstetten, Germany) spectrometer equipped with a PH HR MAS probe (LPS analysis) or 5 mm QCI cryoprobe with z-gradients (O-PSs analysis). NMR spectra of isolated O-PSs were obtained in 2 H 2 O, processed and analyzed as described previously [39]. For high-resolution magic angle spinning (HR-MAS) NMR spectroscopy, LPS (3-4 mg) was suspended in 2 H 2 O and placed into the ZrO 2 rotor. Acetone was used as an internal reference (δ H /δ C 2.225/31.05 ppm) for both O-PS and LPS spectra [21]. The processed spectra ( 1 H, 13 C HSQC-DEPT, 1 H, 1 H COSY, and TOCSY) were assigned with the use of NMRFAM-SPARKY (v1.2, NMRFAM, Madison, Wisconsin, USA) [40].

DNA Isolation
Genomic DNA of the BIDMC 7B and ABC152 strains were extracted from overnight cultures with the Genomic Mini kit (A&A Biotechnology, Gdynia, Poland).

DNA Library Preparation and Sequencing
Libraries were prepared using the Nextera XT DNA Library Preparation Kit (Illumina Inc., San Diego, California, USA) and sequenced on an Illumina MiSeq platform (Illumina Inc., San Diego, CA, USA).

Data Availability
The sequence of the BIDMC 7B strain is available in GenBank under accession number JCNG00000000.1. The sequence of the ABC152 strain was submitted to the GenBank database under accession number JACENF000000000.
Supplementary Materials: Supplementary materials can be found at http://www.mdpi.com/1422-0067/21/18/6572/ s1. Table S1: Kaptive Web analysis results for the 8130 assemblies of K. pneumoniae isolates. Isolates belonging to the O2v2 serotype are marked in green. Isolates belonging to the O1v2 serotype are marked in blue; Table S2: Kaptive analysis results extracted for the 55 K. pneumoniae O2 variant 2 genomes characterized by IS occurrence. Each color indicates closely related strains characterized by genetic similarity. Separately analyzed isolates are not colored; Table S3: Kaptive analysis results extracted for the 5 K. pneumoniae O1 variant 2 genomes characterized by IS occurrence. Isolates belonging to the same strain are marked in blue.