From genotype to phenotype : a clinical perspective for hospital-acquired pneumonia

Clinical metagenomics (CMg), referred to as the application of next-generation sequencing (NGS) to clinical samples, is a promising tool for the diagnosis of hospital-acquired pneumonia (HAP). Indeed, CMg allows identifying pathogens and antibiotic resistance genes (ARGs), thereby providing the information required for the optimization of the antibiotic regimen. Hence, provided that CMg would be faster than conventional culture, the probabilistic regimen used in HAP could be tailored faster, which should lead to an expected decrease of mortality and morbidity. While the inference of the antibiotic susceptibility testing from metagenomic or even genomic data is challenging, a limited number of antibiotics are used in the probabilistic regimen of HAP (namely beta-lactams, aminoglycosides, fluoroquinolones, glycopeptides and oxazolidinones). Accordingly in the perspective of applying CMg to the early diagnostic of HAP, we aimed at reviewing the performances of whole genomic sequencing (WGS) of the main HAP-causing bacteria (Enterobacteriaceae, Pseudomonas aeruginosa, Acinetobacter baumannii, Stenotrophomonas maltophilia and Staphylococcus aureus) for the prediction of susceptibility to the antibiotic families advocated in the probabilistic regimen of HAP.


Introduction
Clinical metagenomics (CMg) refers to the concept of sequencing the DNA of a clinical sample in order to recover clinical information [1].In the context of the diagnostic of infections, CMg consists in sequencing samples in order to identify putative pathogens and to predict their antibiotic susceptibility profiles.CMg has been applied to an increasing diversity of samples: respiratory samples [2,3], urines [4,5], cerebrospinal fluid or brain biopsy [6,7], blood [8][9][10], bone and joint infection samples [11,12] and skin granuloma [13]).CMg takes advantages of the recent development of sequencing methods together with bioinformatics tools.So far, most CMg studies have used Illumina-based technology that typically generates millions of reads from 150-300bp.More recently, long read sequencing methods have been developed (Pacific Biosciences and Oxford Nanopore) but they have barely been used for CMg studies [2,5].While CMg is a promising approach for the diagnosis of infections, several hurdles remain to be tackled: the removal of the host DNA [2,11,12,14]; the capacity to detect and reliably identify pathogens in polymicrobial samples [11,12]; the detection of antibiotic resistance genes (ARGs) and other genomic determinants [15]; the assessment of the linkage between ARGs and their host in case of polymicrobial samples [11], linking a phenotype to the detected ARGs; the turn-around time (though it recently tended to decrease, especially when using the Nanopore sequencers [5]); the establishment of consensual quality control markers; the distinction between pathogens and contaminants [16] and, not the least, the reimbursement of the assay by healthcare structures.
Hospital-acquired pneumonia (HAP) are defined as pneumonia that occur 48 hours or more after admission, and that was not incubating at the time of admission.HAP also include ventilatorassociated pneumonia (VAP), which accounts for up to 25% of all intensive care unit (ICU) infections and for more than 50% of the antibiotics prescribed in the this setting [17].The recommended management of HAP relies on the combination of clinical and bacteriological data [18].When HAP is suspected (upon clinical and radiological grounds), a clinical sample from the lower respiratory tract is collected for quantitative cultures prior to any new antibiotic treatment.The time to antibiotic susceptibility results is usually at least 48 hours, during which a probabilistic antibiotic regimen is given, considering or not the likelihood of a resistant bacteria causing the pneumonia according to the risk factors of the patient [18].Based on current guidelines, few antibiotic families are considered for the probabilistic therapy, namely beta-lactams, aminoglycosides, fluoroquinolones, glycopeptides and oxazolidinones.At least in immuno-competent patients, HAP are caused by a limited spectrum of bacterial pathogens (possibly more than one: co-pathogens), and rarely by viral or fungal pathogens.
In early-onset HAP (occurring within four days of hospitalization [18]), the most frequentlyencountered bacteria are Enterobacteriaceae (mostly drug-susceptible), Haemophilus spp., Streptococcus pneumoniae and Staphylococcus aureus (methicillin-susceptible). In late onset HAP (from day five of hospitalization), Pseudomonas aeruginosa, Acinetobacter baumannii, S. aureus (including methicillin-resistant S. aureus) and drug-resistant Enterobacteriaceae are the most frequent agents [19].Hence, in a genotype-phenotype perspective, a limited combination of antibioticsbacterial species is to be investigated in the context of HAP.To our knowledge, the use of CMg in the context of HAP has been reported once [2], but antibiotic resistance profiles of the pathogens were not investigated [20].
While other methods based on real-time PCR are available and may enable the detection of pathogens and of some ARGs [21,22], they include a limited panel of such targets, and do not span mutational events that can be associated with antibiotic resistance.Hence, CMg could overcome these limitations in being able to reconstruct genomes and precisely infer the antibiotic susceptibility profile, possibly before the culture results [5].Yet, the in silico translation from genotype to phenotype may be challenging because it relies on the quality and exhaustiveness of the available knowledge about the genomic determinants of resistance.First, the ARG database needs to be exhaustive so that no ARG shall be missed.Then, the resistance pattern conferred by the ARGs needs to be known, which is sometimes not possible for variants that have not been experimentally tested.Last, many resistance phenotypes arise from mutational events that lead to a decreased affinity of the antibiotic (e.g.mutations in the topoisomerase in fluoroquinolone resistance), to an increased expression of an intrinsic resistance gene (e.g.blaAmpC in Enterobacteriaceae) or to a decreased expression of a gene (e.g.oprD in Pseudomonas aeruginosa).Unlike acquired ARGs that have been thoroughly collected and studied, data linking specific mutational events with a defined resistance phenotype are lacking, thereby introducing some weaknesses in the genotype-to-phenotype process for some bacteriaantibiotics couples.
In this review, we focused on the results of the various genotype-to-phenotype studies that have been performed for the main HAP pathogens.We excluded S. pneumoniae, Haemophilus influenzae, Legionella pneumophila and Moraxella catarrhalis, which are occasionally found in HAP but do not raise that much concern about antibiotic resistance in this context.Indeed, provided that they would be correctly identified by CMg, the risk of resistance to the adapted antibiotic regimen should be very low.Also, we focused on the antibiotic families that are advocated in the probabilistic therapy, before the culture results are available, because we assume that CMg would be employed as a rapid test for the earlier adaptation of the probabilistic therapy of HAP.

Protocol
Regarding antibiotic resistance, whole genome sequencing (WGS)-based genotype-to-phenotype studies rely on the detection of ARGs stored in dedicated databases, the most popular being Resfinder [23], CARD [24] and ARG-ANNOT [25] (Figure 1).The ARGs sequences are usually sought using BLAST (BLASTN, BLASTP or tBLASTN [26]), with an identity threshold varying from 80-98% identity over 50-80% of the reference sequence, according to the studies.Other studies have used the relative coverage calculated as the product of the identity and the coverage on the reference [27].
Such thresholds are high enough to be specific but they shall not enable the identification of new resistant genes for which no close homologue would be included in the database.Besides alignment based-tools, Hidden Markov model (hmm) -based tool Resfams [28] has also been used to detect ARGs [29].Once identified in the genomic data, an ARG is assumed to be expressed enough to confer resistance to the antibiotics it has been described to provide resistance to.For instance, if a blaCTX-M (a gene encoding for a CTX-M type extended-spectrum beta-lactamase [ESBL]) is detected in an E. coli genome, the strain shall be considered as resistant to all beta-lactams but piperacillintazobactam, cephamycins and carbapenems.Still, the antibiotic spectrum of all ARGs found in the databases is not precisely known as only a fraction was tested in detail, the others being homologues.
Hence, some phenotypes are inferred from the phenotype of the closest homologue that has been characterized.For TEM and SHV beta-lactamases, the precise analysis of mutations in the positions known to alter the phenotype (towards to the ESBL phenotype, resistance to inhibitors or both [complex mutant phenotype]) has to be performed to infer the spectrum of resistance (see http://www.lahey.org/studies/).Ultimately, in NGS-based genotype-to-phenotype studies, the comparator is phenotypic antibiotic susceptibility testing, performed by disk diffusion or broth dilution.
Three types of results are obtained: correct when WGS agrees with conventional methods, major errors (ME) when WGS predicts resistance while the strain is tested susceptible and very major errors (VME) when WGS predicts susceptibility while the strain is tested resistant.

Escherichia coli
Most of the genotype-to-phenotype studies in Enterobacteriaceae have been performed in E. coli [30][31][32].With regards to the antibiotics used in the probabilistic regimen of HAP, E. coli does not harbour any ARG but a chromosomal AmpC-type cephalosporinase.Unlike other AmpC-producing Enterobacteriaceae though, the E. coli blaAmpC is not regulated by the AmpD/AmpR system, and has a weak constitutive expression [33].Nonetheless, specific mutations in the promoter and/or in the upstream regulatory loop can lead to a substantial expression of blaAmpC and to cephalosporin resistance [34,35], but they are rarely found in clinical isolates.Accordingly, E. coli resistance to antibiotics used in the first line of HAP is mainly driven by acquired ARGs, and the accuracy of the prediction rates for antibiotic susceptibility have consistently been high across the three studies: 98.6-100% for ampicillin/amoxicillin, 100% for amoxicillin-clavulanate, 97.2-100% for carbapenems, 97.9-100% for fluoroquinolones and 100% for amikacin [30][31][32] (Table 1).Still, some discrepancies were observed including a blaTEM-1 harbouring E. coli that was unexpectedly susceptible to amoxicillin (MIC 6 mg/L) [31] and a strain with mutations in the promoter region of blaAmpC found to be susceptible to third generation cephalosporins (3GC).Besides, major errors were found for ceftazidime in strains producing a CTX-M [31], the most frequent ESBL found in clinical isolates.CTX-M ESBLs confer a low-level resistance to ceftazidime and the European Committee for antibiotic susceptibility testing (EUCAST) advocates considering as susceptible a strain with a ceftazidime MIC ≤1mg/L, whereas in NGS interpretation, a strain harbouring a blaCTX-M gene shall be considered as resistant to all 3GC.
Another strain had an unexpected resistance to 3GC while no acquired beta-lactamase was detected, and the likely explanation was the presence of an S287R amino acid substitution on AmpC [32].For fluoroquinolones, the observed VME was explained by the non-consideration of mutational events [30].

Klebsiella pneumoniae
K. pneumoniae is intrinsically resistant to penicillins via the production of a narrow-spectrum betalactamase (of SHV, LEN or OKP type), which is constitutively expressed.Two studies have focused on the NGS genotype-phenotype correlation in K. pneumoniae.In the study from Stoesser et al., the correct prediction rates for antibiotic susceptibility were high but not as good as for E. coli: 98.6% for amoxicillin-clavulanate, 97.2% for 3GC, 98.6% for carbapenems, 91.3% for fluoroquinolones and 98.6% for gentamicin (Table 1).As for discrepancies, a strain was unexpectedly found to be susceptible to amoxicillin-clavulanate while an oxacillinase-encoding gene (blaOXA-1) was detected.Also, two strains had mutations in the topoisomerase GyrA but were surprisingly characterized as susceptible to fluoroquinolones.Conversely, two and one strains were found to be resistant to 3GC and meropenem, respectively, while no acquired beta-lactamase gene that could explain this phenotype was found.In the study from our group [36], 18 multidrug-resistant strains were sequenced.For all the antibiotics considered in the HAP context, a correct prediction was observed in all strains (Table 1).

Other Enterobacteriaceae involved in HAP
To our knowledge, no NGS-based genotype-to-phenotype study has been performed for other HAPcausing Enterobacteriaceae (Citrobacter freundii, Citrobacter koseri, Enterobacter aerogenes, Enterobacter cloacae, Hafnia alvei, Klebsiella oxytoca, Morganella morganii, Proteus mirabilis, Proteus vulgaris, Providencia stuartii, and Serratia marcescens).In a recent study though, Pesesky et al. compared the performance of a rules-based prediction algorithm (such as that used in the other NGS-based genotype-to-phenotype studies) to a logistic regression -based prediction algorithm, using the Hmm-based tool Resfams [29].They included 78 strains: 34 E. coli, 29 K. pneumoniae, 9 E. cloacae and 6 E. aerogenes.While species-level data were not provided, the overall accuracy of the rules-based algorithm was 89%, with a ME rate of 6% and a VME rate of 4.9%.The logistic regression -based prediction algorithm performed similarly, with an overall accuracy of 90.8%, but with a lower ME rate (2.6%) and a higher VME rate (6.6%).We assume that the prediction of the susceptibility to 3GC would be challenging for AmpC-producing Enterobacteriaceae since data about the mutational events leading to its overexpression are lacking.
To sum up, several genetic events (most of them being uncharacterized yet) can modulate the expression of the latter resistance determinants, making the genotype-to-phenotype inference quite hazardous for antibiotic susceptibility prediction.Consequently, the correct prediction rates (based on the whole genome sequencing of 388 strains [37]) for meropenem (92.4%), levofloxacin (92.8%) and amikacin (81.5%) were lower than those observed for E. coli and K. pneumoniae (Table 1).Clearly, determinants other than those already associated with resistance in P. aeruginosa need to be identified.

Acinetobacter baumannii
A. baumannii is notorious for being involved in hospital-acquired infections including HAP.Like P. aeruginosa, A. baumannii harbours intrinsic beta-lactamases (a non-inducible AmpC-type cephalosporinase and an OXA-type carbapenemase [OXA-51] that is barely expressed in wild-type strains) and efflux pumps (AdeABC) which overexpression can lead to antibiotic resistance.Thus as for P. aeruginosa, the prediction of antibiotic susceptibility from genomic data shall be challenging.
Unfortunately to date no genotype-to-phenotype study has been performed.The developers of the ARG-ANNOT database [25] have looked for ARGs in a collection of 178 A. baumannii strains, but they did not compare the output with phenotypic data.

Stenotrophomonas maltophilia
As for A. baumannii, S. maltophilia is often met in hospital-acquired infections such as HAP, especially in patients in whom carbapenems have previously been administered.Indeed, S. maltophilia is intrinsically resistant to carbapenems, and more globally to all beta-lactams but ceftazidime and ticarcillin-clavulanate.This phenotype is due to the constitutive expression of two beta-lactamases: L1 (belonging to the Ambler class B) and L2 (belonging to the Ambler class A and being susceptible to the inhibition by clavulanate).The level of expression of L2 combined with the expression of intrinsic efflux pumps (SmeABC, SmeCDE) can lead to resistance to all beta-lactams.
S. maltophilia also resists to aminoglycosides in a temperature-dependant fashion involving the polarity of the lipopolysaccharide [38,39] .Besides, it remains susceptible to fluoroquinolones even if they bind the DNA gyrase with less efficiency (the serine or threonine usually found in the position 83 of GyrA being a glutamine in S. maltophilia).As for P. aeruginosa and A. baumannii, inferring the antibiotic susceptibility of S. maltophilia from genomic data shall be challenging as the mutational events leading to the overexpression of intrinsic ARGs (especially that of L2) remain to be precisely determined.Nonetheless, S. maltophilia is susceptible to sulphonamides, and the sulfamethoxazoletrimethoprim combination is recommended as a first-line regimen in infections caused by S. maltophilia, together with a beta-lactam (ticarcillin-clavulanate or ceftazdime).As sulphonamide resistance occurs through the acquisition of sul genes, WGS should be able to predict with a good accuracy the susceptibility to sulphonamides.

Staphylococcus aureus
Resistance to the main antibiotics used in HAP mostly occurs in S. aureus via the acquisition of ARGs.Resistance to penicillins is mediated by the acquisition of the beta-lactamase encoding gene blaZ, and methicillin resistance arises via the acquisition of the PBP2a-encoding gene mecA.
Resistance to aminoglycosides in S. aureus is due to the acquisition of the aph(3')-IIIa, ant(4')-Ia and aac(6')-aph(2'') genes, while resistance to fluoroquinolones occurs through mutations in the topoisomerases.Resistance to glycopeptides is more complex: it can be due to the acquisition of the van operon, but such strains have been rarely isolated to date.More common are strains with intermediate susceptibility to glycopeptides (glycopeptide intermediate S. aureus, GISA) due to the thickening of the cell wall [40] through the overexpression of vraSR, a two-component system that regulates the expression of murZ, pbp2 and sgtB genes involved in the cell wall synthesis [41].
Another gene, tcaA, [42], and more recently yycG (a component of the WalKR sensory regulatory system) have also has also been associated to the GISA phenotype [43], suggesting that it can be reached by several routes.Still, the precise mutational events in vraSR, tcaA and yycG (and possibly in other genes associated to the GISA phenotype) remain to be determined in order to predict the GISA phenotype from genomic data.Likewise, resistance to linezolid can arise from the acquisition of the cfr gene (that encodes an 23S rRNA methyltransferase) [44] and/or by mutations in the 23S rRNA gene.S. aureus harbours five or six copies of this gene, and the linezolid MIC increases along with the number of mutated copies [45].Hence, recovering five distinct copies of the gene using short reads shall be challenging and result in only one assembled, consensus copy of the gene.Hence, the identification of mutations shall require the re-mapping of reads against the consensus copy, or the use of long-reads sequencing methods.We identified four genotype-to-phenotype studies related to S. aureus [27,[46][47][48].
For penicillin resistance, the genomic prediction consists in the detection of the blaZ gene.In the Bradley et al. study, a high rate of ME was observed (11.7%), likely because of the lack of sensitivity of phenotypic methods (Becton-Dickinson Phoenix and nitrocefin disks in this study) that served as comparators [47].Besides, a careful inspection of the blaZ sequence revealed in six cases a base insertion or deletion causing a frameshift in the Gordon et al. study [27].As for methicillin, very good performances were found, the VME being caused by an overexpression of blaZ and the ME by a likely low expression of mecA.The highest rate of VME was observed for ciprofloxacin (1.2-4.6%,Table 1).While some re-testing revealed that the strains were indeed susceptible, some remained resistant and no explanation could be given [27].An overexpression of the NorA pump, for which genetic determinants are not precisely known, could explain such resistance [49].A limited number of gentamicin-resistant strains could be tested in the Gordon et al. and Bradley et al. studies, yet some VME were observed, with no explanation.Conversely, no ME were found.As for vancomycin, no GISA were included in the strains datasets so that the VME rate could not be assessed.Last, no study included linezolid in the panel of tested antibiotics.

Discussion
In line with the recent EUCAST consultation [50], using WGS to infer the antibiotic susceptibility pattern of HAP-causing pathogens requires more studies to address the current limitations.Indeed, solid data on E. coli, P. aeruginosa, S. aureus and to a lesser extent, K. pneumoniae have been published, but none about the other HAP-causing pathogens such as other Enterobacteriaceae and The performances of WGS for inferring the antibiotic susceptibility profiles of E. coli and S. aureus were high, with few discrepancies as compared with conventional methods.Especially, the prediction for first line antibiotics such as methicillin for S. aureus and 3GC for E. coli was correct in more than 99% and 97% strains, respectively.In a CMg perspective, a rapid NGS-based test could hence allow a rapid adaptation of the empirical antibiotic therapy in case of either resistance or susceptibility to those pivotal antibiotics.We can expect that in species with a similar background such as non-AmpC -producing Enterobacteriaceae (P.mirabilis, C. koseri, K. oxytoca), WGS shall predict antibiotic susceptibility within the same level of accuracy.As for AmpC-producing Enterobacteriaceae, the prediction of 3GC susceptibility shall be tricky given that the mutational events leading to the overexpression of AmpC are barely known.Still, for these Enterobacteriaceae, a fourth-generation cephalosporin (e.g.cefepime) that resists to the AmpC hydrolysis should be considered in the first line of treatment.As cefepime resistance occurs through the acquisition of ESBLs, the correct prediction rates for cefepime susceptibility should be high.
In P. aeruginosa, the correlation between genotype and phenotype are less obvious for meropenem, amikacin and levofloxacin, likely due to the overexpression of the various chromosomal efflux pumps or other unexpected mechanisms [51].As of now, the precise set of mutations associated with the expression of the efflux pumps is not known.In this case, bioinformatic tools such as machine learning shall be used on a large number of isolates in order to associate mutational events to a resistant phenotype and/or the use of transcriptional data to correlate the expression of genes with the phenotype shall be undertaken [51].Such approaches should also be useful to predict the antibiotic susceptibility profiles of A. baumannii and S. maltophilia.
CMg adds even more complexity than WGS as it raises the issue of linking ARGs to their hosts.While some ARGs are of HAP-causing pathogens are chromosomally encoded, several are located on mobile genetic elements that are commonly shared among these bacteria.Hence in case of polymicrobial samples (which is not rare in HAP), linking an ARG with a pathogen remains speculative at best.In a CMg study on bone and joint infections samples, we tried to use the respective depths of sequencing of ARGs and contigs from pathogens to infer some associations (i.e.whether a pathogen would harbour an ARG), but this approach proved inaccurate, suggesting that only a fraction of the bacterial population of one given species could carry the ARG [11].Linking mutational events shall be easier, since they occur in chromosomal genes that can be identified as deriving from a given species.Hence in CMg for polymicrobial samples, it shall be difficult to assess the individual antibiotic susceptibility profiles, and the current way is to consider a comprehensive antibiotic susceptibility profile of the bacteria identified in the sample by including all relevant ARGs, mutational events and intrinsic phenotypes [11].Moreover, the genomes of pathogens must be assembled enough to detect all the possible ARGs and mutational events linked to antibiotic resistance.
In conclusion, the genotype-to-phenotype translation appears to be compelling for some HAP pathogens such as E. coli and S. aureus.More data are expected for other Enterobacteriaceae, and new approaches are needed for P. aeruginosa, A. baumannii and S. maltophilia.Meanwhile, CMg data on these pathogens should be carefully interpreted.

Figure 1 :
Figure 1: Typical bioinformatics flow-chart of the genotype-to-phenotype studies.

Table 1 :
Summary of the performances of the genotype-to-phenotype studies performed on Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa and Staphylococcus aureus.N: number of tested strains.ARG: antibiotic resistance gene.3GC: third-generation cephalosporins.C: correct, (WGS agrees with conventional methods).ME: major errors (WGS predicts resistance while the strain was tested susceptible).VME: very major errors (WGS predicts susceptibility while the strain was tested resistant).Results after re-testing the phenotype with gradient diffusion are showed here.c Results for ceftriaxone d Results for meropenem e Results for ciprofloxacin f Some strains of the set were clones.All strains were multidrug-resistant isolates.g Results for levofloxacin h Results inferred from kanamycin ME percentages in bold highlight that less than 10 susceptible strains were tested while VME percentages in bold highlight that less than 10 resistant strains were tested.