Pneumocystis pneumonia (PCP) is a potentially lethal fungal infection affecting patients with an incompetent immune system, including patients with AIDS, autoimmune disorders, and after organ transplantation, as well as other morbidities requiring medically induced suppression of the immune system. The cause of PCP in humans is Pneumocystis jirovecii, which like other fungi in the genus Pneumocystis, do not respond to the commonly used anti-fungal therapies. A combination therapy, consisting of trimethoprim and sulfamethoxazole (TMP-SMX), which targets the folate biosynthesis pathway at 2 enzymatic steps, remains the primary option for the treatment and prophylaxis of PCP.
In addition to concerns about the toxicity of the TMP-SMX treatment [1
] and low tolerance to sulfa-based drugs in some patients [2
], there is growing evidence of the emerging resistance of the fungi associated with the acquired mutations in the targeted enzymes. Earlier studies simply reported sequence variants found in P. jirovecii
DHPS and DHFR suggesting the development of resistance upon exposure to the drug [3
]. Later, as the number of PCP patients unresponsive to TMP-SMX increased and the corresponding strains of the pathogen were sequenced, it became possible to draw statistically significant associations and estimate possible risks of resistance upon prior exposure to the drug [5
]. Finally, in vitro enzymatic assays and PjDHPS/PjDHFR heterologous systems based on the respective knockouts in Saccharomyces cerevisiae
enabled the measurement of the kinetic parameters of these enzymes with the wild type sequence and identified mutations [12
Recently, a new quantitative model has been suggested to estimate the effect of missense mutations on drug resistance [18
]. The model is based on a massive experiment with Escherichia coli
treated with amoxicillin, followed by the sequencing of mutations in beta-lactamase (TEM-1) and measurement of the corresponding enzymatic activity [19
]. It has been shown that the model used to predict drug resistance based on a combination of individual position specific amino acid probabilities with the amino acid co-variance scores outperforms SIFT [20
], PolyPhen2 [21
], and a set of methods predicting the effect based on the estimated change in stability of the mutated proteins (I-Mutant [22
], MUpro [23
], and PoPMuSiC [24
]. Co-variance scores reveal pairwise concerted changes of amino acids at different positions within a protein sequence and may represent “epistatic” interactions between the residues. Both position specific probabilities and co-variance scores are derived from the multiple sequence alignments (MSA). In this published model, co-variance scores are computed using one of the most advanced methods in the field of protein co-evolution analysis, Direct Coupling Analysis (DCA), which employs approaches from statistical thermodynamics to delineate direct and transient co-variance relationships between residues at different positions in the protein [25
]. However, the complexity of the DCA method brings certain limitations to applicability of the presented quantitative drug resistance model. It requires extensive multiple sequence alignments, deals with well-defined domains only, cannot process multi-domain proteins and sequences longer than 500 amino acids, and is very computationally intensive [18
]. For example, when evaluating the DCA-based model on TEM-1 data, only a fraction of mutations were considered, specifically, those that fell in the Pfam domain and represented single mutations [18
We have recently developed a new tool for the amino acid co-variance analysis, CoeViz [27
] that overcomes most of the limitations listed above for DCA. In particular, CoeViz is not limited by the protein domains nor the large size of the MSA, can handle proteins of any length in a practical time frame, and generates co-variance scores using three metrics: Mutual Information (MI), Chi-squared (χ2
), and Pearson correlation (r
). The tool accounts for phylogenetic bias in the MSA and also provides an alternative way of adjusting the scores for MI using the average product correction (APC [28
In this work, we have built a new model to evaluate the effect of mutations on resistance to drugs. In contrast to the DCA-based model, our approach considers the entire protein sequence and estimates the relative effect of mutations compared to the reference sequence. Moreover, our model can compute the effect of multi-position variants by considering them simultaneously. The new model was trained on the kinetics data of the PjDHFR inhibition by TMP and was further evaluated using experimental data for a different inhibitor targeting PjDHFR (OAAG324 [17
]) as well as inhibition data for PjDHPS, Staphylococcus aureus
DHFR, to estimate generalization of the model to different drugs, drug targets, and organisms.
Pathogens facing selective pressure, such a drug therapy or prophylaxis treatment, are able to develop resistance to the drug through the concerted mutations impeding the binding of an inhibitor or alleviating its action while retaining the essential function of the targeted endogenous protein. PCP exemplifies the problem of emerging resistance when the repertoire of therapeutics is limited. With the advent of the targeted sequencing, it is now possible to quickly identify mutations in the resistant strain of the pathogen. However, the comparative evaluation of these variants on the drug susceptibility is lagging. We have developed a quantitative model that accounts for both individual changes and concerted mutations in the drug target to predict a protein’s resistance to an inhibitor.
Drug resistance, in general, and resistance to antifolates, in particular, may be conferred through alternative mechanisms. In addition to compensatory mutations [12
], pathogens may employ a drug-targeted gene amplification [32
], reduction of cell wall permeability to the drug or encoding alternative forms of the targeted gene [34
], or activation of the ATP-binding cassette (ABC) transporters and multidrug resistance genes (MDR) to efflux drugs out of cell [35
]. Obviously, the proposed model cannot account for these strategies of resistance. Therefore, it most likely will not strongly correlate with the minimal inhibitory concentrations (MIC) commonly used to evaluate the overall drug resistance by a given pathogenic strain. MIC may be a complex function of the drug compensatory mechanisms mentioned above, where mutations in the targeted protein may be important but are not a major factor determining the overall resistance.
Other limitations of the proposed model include inability to quantify variants with insertions and deletions, as well as other mutations unrelated to drug resistance; lack of strong correlation of predictions to the kinetic data for inhibitors possessing a mode of action different than the one(s) a drug target to which it has developed its resistance. Nevertheless, the model may help evaluate and compare resistant strains with known variants in the targeted protein and facilitate predictions of possible resistance conferred through concerted compensatory missense mutations. Such an approach would be quite valuable in microbial systems like Pneumocystis, which do not have an in vitro cultivation system that could be used for such predictions.