Aiming to show the capacities and accuracy of the results that can be obtained with GPathFinder, three cases have been chosen to study in detail. They have a variety of ligand (from 6 to 29 atoms) and protein (from 231 to 477 residues) sizes, and also different levels of previously known data to compare the results: from no knowledge about the concrete system (only about other members of the same family) to X-ray structures of different positions of the ligand along the binding process.
2.1. Transport of Glycerol Across Aquaporin
Aquaporins (AQPs) are a family of transmembrane tube-shaped proteins, allowing transportation of water and other small neutral molecules in and out of the cells. Present along all domains of life, they are of substantial biological importance in the metabolism of the organisms and are involved in several human diseases such as abnormalities of kidney function, loss of vision or onset of brain edema [
26]. The first aquaporin was reported in 1993 by Agre and coworkers [
27], and many others AQPs have been characterized since then.
Traditionally, AQPs have been classified into two main sub-families according to their function [
28]: strict aquaporins, which only are selective to water, and aquaglyceroporins, which can transport water and other small molecules such as glycerol. A third class was added from the characterization of an archaeal aquaporin in 2003 [
29], and a more complex classification has been recently proposed due to the latest incorporations to the AQPs family, with unexpected diversity [
30].
Despite this high diversity, the general tertiary and quaternary structures are retained throughout the family: six transmembrane domains (TMs), three extracellular and two intracellular loops, and two inverted hemihelices denominated HB and HE (
Figure 1).
How different AQPs can be selective to different permeants is believed to be a mix between size restriction of the channel and intermolecular interactions in determinate regions (two Asn-Pro-Ala motifs and the aromatic/arginine selectivity filter (SF)) [
26,
31,
32,
33]. For example, the narrowest point of the channel has a diameter of ~2 Å, ~2.5 Å and ~3.4 Å for AqpZ (aquaporin Z, strict aquaporin), AqpM (archaeal aquaporin) and GlpF (aquaglyceroporin), respectively [
33,
34,
35]. Also, different residues compose the SF in each case (
Table 1 and
Figure 1b–d).
In this case study, the objective was to use GPathFinder to find out possible differences in terms of geometric constraints and binding energy profiles (with special interest in the SF region) when transporting glycerol across three AQPs known to be selective to different permeants (
Table 1): AqpZs are only permeable to water, GlpFs are also permeable to glycerol (besides other small molecules) and AqpMs are in the mid-space, with a much lower permeability to glycerol than GlpFs—some authors [
33] even inferred that glycerol could be ruled out from AqpM permeants.
A first set of calculations were run aiming to enlighten possible geometric constraints in the transport of glycerol. As initial and final points, the extracellular and intracellular regions were selected, respectively. Thus, the shape and length of the solutions proposed by GPathFinder are expected to be similar to each other. The optimization was made using only the geometric criterion trough average steric clashes evaluation. A second experiment was made centered on the SF vicinity: a glycerol was configured to cross the SF (a section of ~10 Å of the channel with the SF in the center was chosen). Besides the average steric clashes criterion, the binding energy (average Vina score of all frames) was taken into account in the pathway optimization.
Ten runs per AQP were carried out with GPathFinder, and a pool of 100 pathways were obtained for each AQP in the first experiment. Two values have been analyzed (
Table 2): the average volume overlap of all frames of the pathway and the volume overlap of that frame with highest value (meaning the bottleneck for glycerol along the trajectory). Average clashes along the trajectory are ~2.4 times higher for strict aquaporin than for aquaglyceroporin, leading to the conclusion that the overall geometric configuration of the channels could play a role in the glycerol permeability, what agrees with already reported data [
26,
32]. Archaeal aquaporin has an average value nearer to GlpF than to AqpZ, which is coherent with experimental data [
33] that found glycerol molecules in all parts of the channel except on the SF region. Frames with highest clashes are always those with the ligand in the vicinity (distance <= 5 Å) of the SF residues, confirming the hypothesis that the geometric configuration of this zone takes an important paper on the permeability. Again, maximum values found for AqpM are nearer to GlpF than to AqpZ, in concordance with the hypothesis that the SF of AqpM could be permeable to glycerol if only geometric constraints are taken into account.
In the second experiment, two simultaneous objectives (clashes and Vina score) were minimized. As the resulting trajectories along the ten runs of calculation are quite similar in shape and length, a good method to analyze the results is to group all the solutions of each AQP study and extract their Pareto front. This selection leads to a total of 230 pathways (85 for AqpZ, 72 for AqpM and 73 for GlpF).
The analysis of the Pareto fronts plot (
Figure 2) reveals a clear ranking in the pathway evaluation: those with best clashes-Vina relation are the GlpF solutions, followed by the ones for AqpM and being AqpZ the worst case. It leads to the following conclusions:
Intermolecular interactions at the SF region (together with its geometric configuration) have a clear influence on the permeability/non-permeability to glycerol of GlpF/AqpZ.
A significative difference is observed in the intermolecular interactions at the SF region of GlpF and AqpM, which can be relevant to explain their different glycerol diffusion rates.
2.2. Unbinding of a Suicide Inhibitor from hIDO1
Human indoleamine 2,3-dioxygenase 1 (hIDO1) is a heme-containing enzyme present in many tissues that catalyzes the dioxygenation of tryptophan. It is considered an important target for cancer immunotherapies, due to evidence that its inhibition could help to prevent the interference of hIDO1 with tumor cells function [
38,
39].
Between the inhibitors that have been developed for hIDO1 only one, BMS-986205 (BMS), acts as a suicide inhibitor. BMS inhibits the enzyme in a permanent manner by binding to the apo-form of hIDO1 at the heme location hence avoiding cofactor inclusion [
40].
A X-ray structure of hIDO1 was reported in 2005 [
41], proposing the ligand/substrate entrance between helices K-L and N, near the RS-loop. A more recent study [
42] characterized three X-ray structures that display different steps along the binding mechanism of the BMS inhibitor and thus proposes a clear binding pathway (
Figure 3).
In this case study, we aim at finding binding pathways of the inhibitor and compare the results with the experimental interpretation. As seen before, BMS exhibits an unusual flexibility along the binding process, which together with the structural changes observed in the receptor, represent a significant challenge for ligand pathway predictors. A set of five runs were carried out starting the simulations with the inhibitor molecule at the heme binding site (PDB code 6mq6, chain B). No restrictions were imposed about the direction that the inhibitor has to follow to leave the enzyme. The optimization of the solutions was configured to take into account both steric clashes and binding energy.
A pool of 293 solutions was obtained from the calculations. Dominant solutions for both clashes and Vina scores were selected, leading to a total of 50 solutions that configure the Pareto front (
Figure 4). Among them, two different morphologies (i.e., points that conform the trajectory) of pathways were identified (
Figure 5), being the other 48 solutions with the same direction as the former but with differences in ligand or protein conformations in one/some of the frame/s.
In all these pathways, the entrance of the ligand is produced between helices K-L and N, making use of the hole near the RS-loop proposed in [
41] rather than nearer to the LM-loop using the W237-switch mechanism. This preference could be explained because the incapacity of NMA to find a conformation with a displacement big enough in this concrete region (due to its global nature) or, even simpler, because all the solutions calculated with GPathFinder have a better score than those passing nearer W237, which were then discarded in the evolutive process of the algorithm. A further MD study could probably provide some insight on this issue, but that is out of the scope of the present work.
Solution with lower Vina score and average clashes under 50 Å
3 was selected for further analysis (
Figure 6a), revealing that BMS inhibitor adopts different conformations along its binding process (in a similar manner than “extended”, “kinked” and “bent” conformers found in [
42]). It must be noted that hydrophobic interactions mainly represent the key recognition factor of the binding process in the analyzed pathway (
Figure 6b–d).
2.3. Human Cytochrome P450 2C19
Cytochrome P450s (CYPs) are a large family of heme-containing enzymes with a wide variety of functions, including steroid, vitamin A and D, fatty acid and eicosanoid metabolism [
43]. They are responsible for metabolizing a ~75% of the (small molecule) drugs available [
44], being CYP2C19 one of the five with highest activity [
45].
Although the overall secondary structure and folding are conserved along the family, there is significant structural variability in the active site and the access/egress routes [
46]. For example, a comparison of the CYP2C19-0XV complex (PDB code 4gqs) and the CYP2C9-flurbiprofen complex (PDB code 1r9o) indicates deviations greater than 3.0 Å for equivalent Cα of the outer substrates of their binding cavities [
47]. The 0XV refers to the compound (4-hydroxy-3,5-dimethylphenyl)(2-methyl-1-benzofuran-3-yl)methanone.
Several (un)binding routes were identified in 2007 in a computational study using all the CYP structures available until then [
48], defining also a nomenclature for the pathways which is used in this study (
Figure 7). Depending on the concrete CYP and substrate combination (even for the same CYP and different substrates), it was found that the channel/s used differ; so it suggests that channels play an important role in substrate selectivity [
46,
49].
Here, we aimed to provide some insight on the channel/s used by 0XV to access the CYP2C19 active site, which, to the best of our knowledge, are still not characterized. 50 runs minimizing only steric clashes and 50 runs taking into account both geometric and binding energy criteria were carried out using the default parameters for the GPathFinder input file. The substrate molecule was placed in its binding position (PDB code 4gqs) and no restrictions about the direction to exit the protein were imposed.
In this case, results are similar in both experiments (
Table 3), suggesting that the geometric constraints of the different routes are more important than intermolecular interactions in the selectivity of the 0XV by CYP2C19. The only exceptions are the appearance of the “2a” and “2b” routes (frequencies of 13.4% and 6.9%, respectively) when taking into account the Vina score in the evaluation. In the majority of the solutions, 0XV used the solvent channel to access the binding site, which is present in half of the P450s studied by Cojocaru et al. [
48]. In second position, 0XV accesses trough channel 2c between the G and I helices and the B’ helix/BC-loop. Other routes observed with very minor frequency are the 2e and 2ac. The sum of these P450 well-characterized pathways represents more than the 95% of results in both studies, meaning that GPathFinder was able to find good results in a complex system using a standard input file.