A Comprehensive Genomic Analysis of the Emergent Klebsiella pneumoniae ST16 Lineage: Virulence, Antimicrobial Resistance and a Comparison with the Clinically Relevant ST11 Strain

Klebsiella pneumoniae is considered an opportunistic pathogen frequently involved with healthcare-associated infections. The genome of K. pneumoniae is versatile, harbors diverse virulence factors and easily acquires and exchanges resistance plasmids, facilitating the emergence of new threatening clones. In the last years, ST16 has been described as an emergent, clinically relevant strain, increasingly associated with outbreaks, and carrying virulence factors (such as ICEKp, iuc, rmpADC/2) and a diversity of resistance genes. However, a far-reaching phylogenetic study of ST16, including geographically, clinically and temporally distributed isolates is not available. In this work, we analyzed all publicly available ST16 K. pneumoniae genomes in terms of virulence factors, including capsular lipopolysaccharide and polysaccharide diversity, plasmids and antimicrobial resistance genes. A core genome SNP analysis shows that less than 1% of studied sites were variant sites, with a median pairwise single nucleotide polymorphism difference of 87 SNPs. The number and diversity of antimicrobial resistance genes, but not of virulence-related genes, increased consistently in ST16 strains during the studied period. A genomic comparison between ST16 and the high-risk clone ST11 K. pneumoniae, showed great similarities in their capacity to acquire resistance and virulence markers, differing mostly in the great diversity of capsular lipopolysaccharide and polysaccharide types in ST11, in comparison with ST16. While virulence and antimicrobial resistance scores indicated that ST11 might still constitute a more difficult-to-manage strain, results presented here demonstrate the great potential of the ST16 clone becoming critical in public health.


Introduction
The Gram-negative bacterium, Klebsiella pneumoniae, is considered an opportunistic pathogen associated with infections in hospitalized or otherwise immunocompromised individuals [1][2][3]. K. pneumoniae contains a core genome, and a large and variable accessory genome harboring virulence factors and antimicrobial resistance genes associated with hospital-acquired infections [2]. Among the most important virulence factors, lipopolysaccharides and polysaccharide capsule that help to evade the immune system, and iron-chelating siderophores, such as yersiniabactin (ybt) and colibactin (clb), are located on integrative conjugative elements (ICEKp), and aerobactin and salmochelin are plasmid encoded [3][4][5]. Other important virulence mechanisms are associated with the hypermucoviscous phenotype and the presence of the capsular regulator genes rmpA, rmpA2 and magA that up regulate capsule production [2,3,6]. The hypervirulent (hv) phenotype of K. pneumoniae is strongly associated with the presence of capsular regulator genes and siderophores [3,7]. The accessory genome is also central to antibiotic resistance in K. pneumoniae [2]. Resistance to carbapenems is mainly attributed to the presence of carbapenemases, including K. pneumoniae carbapenemase (KPC), Oxacilinases (OXA) and

Resistome, Virulome and Mobilome
The resistome was determined using the NCBI Antimicrobial Resistance Gene Finder Plus (AMRFinderPlus) v.3.10.24 (https://github.com/ncbi/amr, created by Michael Feldgarden, Bethesda, Maryland, United States, accessed on 5 May 2022) with the database "Core" and "Plus", and the microorganism Klebsiella as previously described [36]. The resistome data was used to evaluate changes in resistance patterns during the studied period. Results are shown in percentages representing the number of AMR genes or AMR gene variants identified per year over the total number of AMR genes or AMR gene variants identified in all ST16 genomes. Additionally, Kleborate v2.2.0 (https://github.com/katholt/Kleborate, created by Ryan Wick, Victoria, Melbourne, Australia, accessed on 5 May 2022) was used to determine the following resistance scores to ST16 isolates: 0 = no ESBL or carbapenemase; 1 = ESBL without carbapenemase (regardless of colistin resistance); 2 = carbapenamase without colistin resistance (regardless of ESBL); 3 = carbapenemase with colistin resistance (regardless of ESBL).
Finally, AMR-virulence convergence in ST16 was analyzed based on resistance and virulence scores (convergence defined as virulence score ≥3 and resistance score ≥1) generated using Kleborate as described by Lam et al. [15].
In order to understand how AMR genes are distributed in ST16 isolates worldwide, and if resistance patterns changed in time, we analyzed 912 isolates for which information about the location for sampling was available. A heatmap illustrating the number of antimicrobial resistance genes found per location is presented in Figure 2. According to reported data, the average of number of AMR genes was lower in Pakistan and Hong Kong (≤10) and higher numbers were found in Myanmar, Denmark and Honduras (>20) (Figure 2). For a total of 853 genomes (89.13%) the year of isolation was available, the first ST16 genome having been reported in 2003 and the last one in 2022. A diverse set of AMR genes ( Figure 3A) and a significant increase in maximum AMR gene numbers per genome per year were observed (p < 0.0001, R 2 = 0.7740) ( Figure 3B). On the other hand, the maximum number of virulence genes per genome did not increase over time (R 2 = 0.3472) ( Figure 3C). about the location for sampling was available. A heatmap illustrating the number of antimicrobial resistance genes found per location is presented in Figure 2. According to reported data, the average of number of AMR genes was lower in Pakistan and Hong Kong (≤10) and higher numbers were found in Myanmar, Denmark and Honduras (>20) ( Figure  2). For a total of 853 genomes (89.13%) the year of isolation was available, the first ST16 genome having been reported in 2003 and the last one in 2022. A diverse set of AMR genes ( Figure 3A) and a significant increase in maximum AMR gene numbers per genome per year were observed (p < 0.0001, R 2 = 0.7740) ( Figure 3B). On the other hand, the maximum number of virulence genes per genome did not increase over time (R 2 = 0.3472) ( Figure  3C).

Plasmid Incompatibility Groups and Transposons
Fifty-nine different plasmid incompatibility groups were detected, with almost 100% of isolates harboring at least one of such structures (949 out of 957 isolates). The most common Inc groups were Col(pHAD28) ( (Figure 4).

Plasmid Incompatibility Groups and Transposons
Fifty-nine different plasmid incompatibility groups were detected, with almost 100% of isolates harboring at least one of such structures (949 out of 957 isolates

Plasmid Incompatibility Groups and Transposons
Fifty-nine different plasmid incompatibility groups were detected, with almost 100% of isolates harboring at least one of such structures (949 out of 957 isolates  (Figure 4).

Virulence Factors and Heavy Metal Resistance
Capsular polysaccharide (K antigen) and lipopolysaccharide (O antigen) are important virulence factors in K. pneumoniae, with clinical and epidemiological importance including the identification of hypervirulent strains [54,60]. More than 138 combinations of K-locus are currently known [54]. In this study, 10 different K-types were identified (Table S2), with KL51 being the most prevalent (759/957; 79.31%). Six different O-types were found, with O3b as the most prevalent (886/957; 92.58%) (Table S2).

Genetic Diversity of ST16 Genomes
Considering the 957 ST16 isolates analyzed in this study, the pan genome consisted of 29,510 genes, of which 3761 (12.74%) made up the core genome (i.e., genes found in 99% of full genomes). The accessory genome, 25,749 genes, consisted of 23,601 uncommon genes, present in ≤15% of the isolates (cloud genes) and 1495 shell genes, present in <95% of the isolates.
Normalized polymorphism counts were used as a measure of genetic variation within the population, and we found the overall species median nucleotide divergence to be 0.003356% (median pairwise single nucleotide polymorphism [SNP] difference = 87 SNPs), suggesting that less than one percent of the core genome nucleotide sites were variant sites. This nucleotide diversity is comparable to previously shown measures of genetic variation of core genome alignment of K. pneumoniae clonal group CG23 [66].
In order to determine the phylogeny of the 957 ST16 isolates, a maximum-likelihood tree was inferred from the aligned core genomes by roary ( Figure 6, which is also available for interactive viewing at https://microreact.org/project/drXbkjrfrVUP6TiEQAVh7r-treephylogenetic-kp-st16. Two major groups were identified, one consisting of 47.23% (452/957) of the isolates (Figure 6, which is also available for interactive viewing at https://microreact.org/project/56YFd6ALYc1eLHMzjX6wJa-tree-phylogenetic-kp-st16group-1), and one by 35.94% (344/957) ( Figure 6, which is also available for interactive viewing at https://microreact.org/project/kFHeSw2Qr9cecxdgX7oSnt-tree-phylogenetickp-st16-group-2). The first group included isolates from 37 of the 40 countries that reported ST16 isolates, contrasting with the second group that depicted isolates from only 8 countries. The majority of samples from the second group originated from Thailand A combination of ICEKp10 and of colibactin (clb3), which induces DNA damage in eukaryotic cells [54], was detected in four genomes, two of them isolated in the United States and two in Brazil. Besides clb3, other markers of hypervirulence were detected in ST16 genomes: rmpADC and/or rmpA2 in 11 genomes (1.14%), aerobactin in 26 genomes (2.71%), and salmochelin in 2 genomes (0.20%).
AMR and virulence genes used to be segregated in non-overlapping K. pneumoniae populations, but recent reports highlight the convergence of AMR-virulent strains which have the potential to cause difficult-to-treat infections [34]. In 25 genomes evaluated in this study, AMR-virulence convergence was detected, a result that highlights the need to monitor the dissemination of ST16, especially in Australia, India and the United States, where these isolates were reported.

Genetic Diversity of ST16 Genomes
Considering the 957 ST16 isolates analyzed in this study, the pan genome consisted of 29,510 genes, of which 3761 (12.74%) made up the core genome (i.e., genes found in 99% of full genomes). The accessory genome, 25,749 genes, consisted of 23,601 uncommon genes, present in ≤15% of the isolates (cloud genes) and 1495 shell genes, present in <95% of the isolates.
Normalized polymorphism counts were used as a measure of genetic variation within the population, and we found the overall species median nucleotide divergence to be 0.003356% (median pairwise single nucleotide polymorphism [SNP] difference = 87 SNPs), suggesting that less than one percent of the core genome nucleotide sites were variant sites. This nucleotide diversity is comparable to previously shown measures of genetic variation of core genome alignment of K. pneumoniae clonal group CG23 [66].
In order to determine the phylogeny of the 957 ST16 isolates, a maximum-likelihood tree was inferred from the aligned core genomes by roary ( Figure 6, which is also available for interactive viewing at https://microreact.org/project/drXbkjrfrVUP6TiEQAVh7r-treephylogenetic-kp-st16, accessed on 5 May 2022. Two major groups were identified, one consisting of 47.23% (452/957) of the isolates ( Figure 6, which is also available for interactive viewing at https://microreact.org/project/56YFd6ALYc1eLHMzjX6wJa-tree-phylogenetickp-st16-group-1, accessed on 5 May 2022), and one by 35.94% (344/957) ( Figure 6, which is also available for interactive viewing at https://microreact.org/project/kFHeSw2Qr9 cecxdgX7oSnt-tree-phylogenetic-kp-st16-group-2, accessed on 5 May 2022). The first group included isolates from 37 of the 40 countries that reported ST16 isolates, contrasting with the second group that depicted isolates from only 8 countries. The majority of samples from the second group originated from Thailand (89.53% [308/344]), corresponding to 92.21% (308/334) of the genomes reported by this country.
ICEKp was detected in 30.75% (139/452) of isolates from group 1, where the most diversity of ICEKp was also found (ICEKp 1, 2, 3, 4, 10, and 12); only ICEKp11 was not found in this group ( Figure 6, which is also available for interactive viewing at https://microreact. org/project/qrmdyoehcemjUDndtbnE4A-tree-phylogenetic-kp-st16-icekp-type, accessed on 5 May 2022), but ICEKp11 was identified in a single isolate from Australia (2019), and this isolate did not cluster in either of the two groups.
Due to the over-representation of isolates from Thailand in group 2 (92.21%, 308/334 isolates) we investigated their origin in more detail. The genomes derive from 8 distinct NCBI Bioprojects (PRJNA389557, PRJNA414481, PRJNA497260, PRJNA532291, PRJNA717739, PRJDB5929, PRJNA644880 and PRJEB19226), 75% of them were reported in Bangkok between 2015 and 2018. The second largest dataset came from the province of Surat Thani (10.38%, 32/308 isolates). Taken together, the data suggest that isolates from Thailand do not correspond to a restricted location and represent the diversity of ST16 found in the country.
A total of 161 isolates did not cluster within groups 1 or 2. Interestingly, 72.04% (116/161) of these isolates carried ICEKp4, and this structure was only found in three isolates (3/452) belonging to group 1. Even though all these 116 isolates harbored ICEKp4, the isolates were significantly distinct in their core genome, distributed in 5 continents (Asia, Central America, Europe, North America and Oceania), and found in a total of 18 countries between 2012 and 2020.  (Figure 6, which is also available for interactive viewing at https://microreact.org/project/kFHeSw2Qr9cecxdgX7oSnt-tree-phylogenetic-kp-st16-group-2), with the exception of one isolate, also from Thailand, reported in 2017. Despite the widespread of ICEKp3 within this group, 111/452 (24.55%) of isolates from group 1 also harbored ICEKp3.  (Figure 6, which is also available for interactive viewing at https://microreact. org/project/kFHeSw2Qr9cecxdgX7oSnt-tree-phylogenetic-kp-st16-group-2, accessed on 5 May 2022), with the exception of one isolate, also from Thailand, reported in 2017. Despite the widespread of ICEKp3 within this group, 111/452 (24.55%) of isolates from group 1 also harbored ICEKp3.
The most common gene encoding a carbapenemase among ST16 isolates was bla OXA-232 , present in 38.24% of ST16 isolates ( Table 1). The bla OXA-232 gene was found in ST16 isolates from 13 countries (Thailand, Unknown, Australia, UK, USA, Netherlands, France, Canada, India, Switzerland, Italy, Qatar, Sri Lanka and Turkey), but most frequently found in isolates from Thailand (81.96%) (300 out of 366 isolates from Thailand carried this gene). Due to the high contribution of isolates from Thailand, we reevaluated the data considering only isolates from the other 12  CTX-M-15 is the most prevalent ESBL type in Europe, having also been found in North America and Asia more recently [14]. This gene was detected in 85.78% of the ST16 isolates (Table 1), against 12.08 to 41.9% reported for ST11 [10,14]. On the other hand, bla CTX-M-65 is prevalent in ST11 isolates (28.9 to 59.61%) [10,14], but not found in ST16 isolates (Table S1). OXA-48 was similarly found in both STs, in 12.5% of ST11 [10] and 8.35% of ST16 (Table 1). In our study, bla OXA-1 was found in 40.33% ST16 (Table S1), against 12.08% reported for ST11 [14].
Kleborate virulence scores capture the hierarchy of virulence-associated loci that have emerged over the last two decades according to the literature [34]. The score considers the presence of ybt, clb and iuc, and is interpreted as follows: score 0 = none present, 1 = yersiniabactin only, 2 = colibactin without aerobactin (regardless of yersiniabactin, however, ybt is almost always present when clb is), 3 = aerobactin only, 4 = aerobactin and yersiniabactin without colibactin, and 5 = all three present. Yersiniabactin facilitates immune escape and has been shown to increase virulence in various strains [34,70]. The presence of the genotoxin clb together with ybt may increase virulence due to clb's genotoxic activity in mammalian cells [34,71], but limited evidence indicates increased virulence in the absence of the virulence plasmid [34]. The iuc is related to sepsis since it promotes bacterial growth in blood via the acquisition of iron from transferrin [34,72,73]. Taken together, there is limited data assessing the individual contributions of ybt, clb and the virulence plasmid when present in combination, but it is logical to score iuc + ybt with higher virulence than iuc without ybt [34]. The combination of iuc (virulence plasmid) + ybt + clb as the highest score, is associated with hypervirulent strains in liver abscesses documented worldwide (CG23-I) [34]. Kleborate showed similar results for ST11 and ST16, although more ST11 isolates showed Score 2 (7.1%) and isolates showed Score 4 (2.8%) ( Figure 7A). Kleborate score ( Figure 7B), although more ST16 isolates belonged to Score 1 (ESBL+ and Carb-) (see "Material and Methods") when compared to ST11.
Kleborate resistance scores do not include the presence of intrinsic variants of oqxAB, chromosomal fosA and ampH since they are not associated with clinical resistance in K. pneumoniae species complex (KpSC) [34].  The resistome of ST16 is also alarmingly similar to ST11, as confirmed by the Kleborate score ( Figure 7B), although more ST16 isolates belonged to Score 1 (ESBL+ and Carb-) (see Section 2."Material and Methods") when compared to ST11.
Kleborate resistance scores do not include the presence of intrinsic variants of oqxAB, chromosomal fosA and ampH since they are not associated with clinical resistance in K. pneumoniae species complex (KpSC) [34].

Discussion
K. pneumoniae belonging to ST16 is an important emergent lineage that can carry determinants to carbapenem resistance and is believed to have an enhanced virulence potential, as observed by patient survival analysis and in vitro studies [15,16,74].
According to the core SNP analysis of 957 ST16 genomes, two major groups comprised over 80% of all reported isolates (796/957; 83.17%), with the major group including globally distributed isolates reported in Africa, Asia, Europe, North America, South America and Oceania.
This study showed a significant increase in the maximum number of AMR genes in a genome per year ( Figure 3B; p < 0.0001), a finding that highlights the importance of monitoring the emergence of extremely drug-resistant (XDR) ST16 isolates, already reported in several locations with distinct antimicrobial resistance profiles and clear evidence of horizontal gene transfers [19,20,[75][76][77]. Forty-eight different AMR genes associated with resistance to β-lactam antibiotics, including ESBL, were identified, with a drastic increase in the number of AMR genes per genome since 2012: between 2003 and 2011, four to eight AMR genes or gene variants were found; and between 2012 and 2021, 11 to 28 AMR genes or variants associated with resistance to β-lactam antibiotics, including ESBL, were identified. These finding demonstrate that ST16 isolates have been acquiring different AMR genes and that many of these genes are now widespread among ST16 isolates. The increasing incidence of ESBL and carbapenem resistant of ST16 isolates, coupled with their increasing global distribution, also emphasizes their potential contribution for rapid dissemination of AMR genes to other highly-virulent pathogens [78].
A similar scenario was observed for aminoglycosides, with numbers between 2003 and 2011 varying from two to six genes or variants, and between 2012 and 2021 increasing to 10-18. Thirty-one genes associated with this type of resistance were found, the most common being aac(6 )-Ib-cr5 encoding an aminoglycoside acetyltransferase that also has activity against fluoroquinolones [79,80]. The aac(3)-IVa (n = 10 genomes) and aph(4)-Ia (n = 10 genomes) were identified exclusively in the United States. For quinolones, there was no significant increase in the number of AMR in time, with two to nine genes or variants identified per genome per year during the studied period. Noteworthy was the presence of qnrE1, reported in five isolates from Brazil in 2018. This gene was originally found in a conjugative plasmid in a clinical K. pneumoniae isolate [81]. Also noteworthy are reports of qnrB1 since 2012 in 13 countries but mostly prevalent in the United States (18/45; 40%), and qnrS1, reported since 2013 in 19 different countries but most prevalent in the United States, Australia and Thailand (19/97; 19.58%), (17/97; 17.52%) and (14/97; 14.43%), respectively. Both genes have been found in plasmids [55,56].
The contribution of horizontal gene transfer to the spread of AMR genes could be clearly illustrated by the widespread distribution of ICEKp and plasmids. Seven types of ICEKp (1, 2, 3, 4, 10, 11 and 12) were identified, found in 63.21% (605/957) of the evaluated isolates, similar to a previous report that found ICEKp in 61.04% of the evaluated ST16 genomes [10].
A direct link between AMR genes and plasmids is not always possible with short-read assemblies, but their central role in the dissemination of AMR genes, and importance for epidemiological discussions, is well known [10,82,83]. The two plasmid replicons were widespread in the United States and identified exclusively in this country: FII(pBK30683) (n = 17) and pKPC-CAV1193 (n = 37). The analysis of 334 ST16 isolates from Thailand identified replicons ColKP3 and IncFIA/B, present in 90.11% (301/334) and 92.21% (308/334) of isolates, respectively, corroborating with findings by Abe et al. [84]. These replicons have been associated with the dissemination of bla NDM-1 and bla OXA-232 in Thailand and Canada [84], and ColKP3-harboring bla OXA-232 was also reported by Argimón et al. [10,85]. K. pneumoniae ST16 co-carrying bla NDM-1 and bla OXA-232 is considered a successful clone in Thailand, a characteristic they fund to be exclusive to ST16 isolates among 43 STs classifying 577 carbapenem-resistant K. pneumoniae isolates in a nationwide surveillance study carried out in the country [86]. Here, we found 286 isolates carrying this combination of genes, 90.20% of which (258/286) were from Thailand.
KPC-encoding genes are often found in transposons, with Tn4401 being the most prominent transposon type associated with KPC-2 [58]. Accordingly, 95.49% (106/111) of isolates that carried KPC in this study also harbored a Tn4401-like transposon. KPC was mostly detected in Brazil and the United States, which together accounted for 78.37% (87/111) of cases. The first report of an ST16 isolate in Brazil harboring KPC-2 occurred in 2008 [30]. Since then, this clone has been often associated with KPC-2, nowadays endemic both in Brazil and the United States [15,16,32,68,87]. The inconsistent carriage of KPC-2 within the ST16 clone suggests the KPC enzyme was acquired through the horizontal transfer of plasmids after the divergence of the ST16 lineage rather than vertical gene transfer among the entire lineage [60].
K. pneumoniae isolates may belong to high-risk clonal groups: hyper-virulent clonal complexes (CCs), mostly responsible for community-acquired invasive infections, and multidrug-resistant CCs, mostly involved in health-care associated infections [60,88,89]. The majority of ST16 isolates investigated here belong to multidrug-resistant CCs, in agreement with previous reports [17,19,76]. Worth noting is that recent studies have shown increased mortality associated with ST16 but have not found specific characteristics of ST16 that justified this association [15,17]. Our results corroborate with these reports since less than 3% of ST16 harbored markers of hyper-virulence, highlighting the need for a deeper understanding of genetic features of ST16.
Finally, due to the clinical importance of K. pneumoniae clone ST11 [14,34], we investigated similarities and differences between ST16 and ST11. CPS and LPS O antigen is a key virulence determinant in K. pneumoniae. The great diversity of CPS and adjacent LPS O antigen of ST11 is associated with a recombination hotspot at the biosynthesis loci [29,90,91]. In ST16 strains, this hotspot in the biosynthesis loci has not been described, and the ST16 isolates investigated here showed a reduced variability of CPS and LPS, withKL51 and O3b the predominant serotypes.
Although ST11 and ST16 showed similar virulence scores, ST11 has been more studied and associated with hyper-virulence, such as the ST11-KL64 strain reported in China, carrying rmpA/rmpA2, iucABCD and iutA genes [29]. Given the global success of ST11 and similarities with ST16 in terms of virulence and antibiotic resistance, this emerging clone has the potential to become a global epidemic dual-risk clone and a major public health threat.