3.1. Evaluation of the Performance of AutoDock and Vina
The chemical and structural properties of different proteins and enzymes can vary quite significantly, in features that include the nature, type, and range of interactions around the binding pocket, the pocket size and shape, and the exposure to solvent. Therefore, the challenges that such systems offer to docking and to virtual screening can also be quite different. Some programs and scoring functions are better able to capture some of these characteristics, while other show improved performance in targets with other features.
Table 2 compares the performance of AutoDock and Vina across the different classes of targets. The average results obtained for the set of 101 target showed that AutoDock and Vina exhibit a similar average performance in discriminating between ligands and decoys. In fact, the average EF1% values obtained were 7.6 and 8.9 for Vina and AutoDock, respectively (AUCs of 68.0 and 66.4). The EF1% values calculated for this extended data set show that these programs are able to rank in the top 1% of the total ligands (active and decoys) docked against each target, 7.6- and 8.9-times more active ligands than what would be expected from random selection, considering the relative percentage of actives and decoys available for each target.
However, the discrimination ability across different target classes could vary significantly. For GPCRs, for example, AutoDock exhibited superior discrimination ability, with an average EF1% of 16.6 against only 2.8 with VINA. AutoDock also demonstrated improved performance over Vina for Nuclear Receptors (EF1% of 18.4 versus 15.0). However, for kinases and metalloproteins the discrimination ability of Vina is on average better than that of AutoDock.
Figure 3, shows the average AUC values, calculated for the different target families. As previously mentioned, the higher the AUC, the better the discrimination ability between actives and decoys. AutoDock provided better results for GPCRs, ion channels, and nuclear receptors. Vina worked better for all the other families.
However, across large families of proteins there could be significant variations in the docking results, when looking into individual proteins. In the case of metalloenzymes, for example, Vina provided better results, on average. Analyzing each target in particular (
Figure 4) it could be seen that for some targets the AutoDock performed significantly better. This might be explained by the fact that in this family there is a large variability of types of proteins as this group includes kinases, proteases, and others.
Table 3 analyzes the performance of AutoDock and VINA taking into consideration the number of amino acid residues that constitute the target. For smaller targets, the driving force for ligand-binding tends to be more concentrated in a smaller number of key specific residues. Additionally, the binding pockets tended to be smaller, or often more exposed to the solvent. On the other hand, in larger protein-targets, the range of interactions involved in ligand-binding tended to be larger and more diffused. In addition, the extra number of amino acid residues present in the larger targets could confer a more controlled environment to the corresponding binding pockets, shielding the interactions formed from the effect of the solvent. The non-specific protein environment could play a more important role for ligand-binding in these targets. Therefore, the number of amino acid residues that constituted the different targets could offer different trials for docking and virtual screening.
The results from
Table 3 show that Vina was, on average, better in discriminating ligands from decoys in medium-sized targets, with 250 to 400 amino acid residues (average EF1% of 11.5, AUC 71.8). For targets with more than 400 amino acid residues, the performance of Vina was significantly lower (average EF1% of only 6.1, AUC of 65.4)
AutoDock exhibited a more uniform behavior, with average EF1% values in the range 7.9–9.4 for small (less than 250 aa) and large targets (more than 400 aa), resulting in an improved performance over Vina for the small targets (<250 aa) and the large targets (>400 aa).
Another important aspect regarding the nature of the target protein concerns the type of amino acid residues that constitute each binding pocket. For this analysis, all amino acid residues defining each binding pocket were grouped into polar, charged (negative and positive), and hydrophobic amino acid residues. Binding pockets were characterized based on the relative percentage of each of these types of residues. Average EF1% and AUC values were calculated with AutoDock and Vina for each category. The results are presented in
Table 4.
The results presented in
Table 4 showed that for poorly polar binding pockets (less than 25% of polar residues) AutoDock was on average better than Vina in discriminating between ligands and decoys, particularly among the top 1% of ranked solutions. For moderately polar and very polar binding pockets, Vina exhibited a better performance than AutoDock. The results also showed that both programs had more difficulty in discriminating ligands and decoys for very polar binding pockets (>35% of polar amino acid residues).
In terms of the percentage of hydrophobic residues, the results showed that Vina was significantly better than AutoDock in ligand/decoy discrimination for poorly hydrophobic binding pockets. As the percentage of hydrophobic residues at the binding pocket increased, the performance of Vina and AutoDock became increasingly similar, both in terms of EC1% and in terms of AUC values.
In terms of charge, the results showed that AutoDock was better in discriminating ligands and decoys in poorly charged binding pockets (<15%) than in moderate or highly charged ones. Vina, on the other hand, gave best results in highly charged binding pockets. These general tendencies concerning the presence of a charge at the binding pocket were also observed when particularly looking into positively charged residues or into negatively charged residues.
In general, these results showed that AutoDock was better in discriminating ligands and decoys in more hydrophobic, poorly polar, and poorly charged pockets, while Vina exhibited early recognition metrics that did not vary so significantly with the type of amino acid residues at the binding pocket. Vina tended to give better results for polar and charged binding pockets, which was particularly interesting, taking into consideration that the scoring function of Vina did not explicitly include charges, while that of AutoDock had an explicit electrostatic term.
3.2. Substrates
The type of molecule to be evaluated and its physico-chemical characteristics also offer different challenges for virtual screening, in terms of docking and its ability to discriminate between actives and decoys. For each specific target, the decoys included in the DUD–E were generated by having similar 1-D physico-chemical properties to the actives from which they originated, to remove bias [
32]. Hence, to analyze how the different substrate properties affected the discriminating ability of each target, the physical properties of all actives identified in the ligands ranked as the top 1% were evaluated and compared with the other actives that were ranked the worst.
In this study, four fundamental properties of the ligands were analyzed—the size of the ligands, polarity, charge, and the number of rotatable bonds.
Figure 5 and
Figure 6 present heat maps of the correlation between the substrate properties and their position in the ranking according to the type of target family (proteases and metalloenzymes, respectively). Darker red (+1) yield perfect positive correlation while darker blue (−1), yield perfect negative correlation. From
Figure 5, it is clear that polarity and number of rotational bonds is important for both Vina and is even more distinct for AutoDock, since it presents a positive correlation, that is, as the ranking number increases, the polarity and number of rotational bonds also increase. This means that the molecules with more rotatable bonds and which are more polar, are ranked worst in the list. This leads to the conclusion that more polar and more flexible molecules present a bigger challenge for AutoDock, in particular. For metalloenzymes, the correlation profile is a little bit different from proteases. It is not easy to find a clear tendency because while some targets present a positive correlation for some property, others have a negative correlation for the same property. This could again be explained by the large variability of protein types in this particular family.
3.1.1. Influence of Molecular Weight
Figure 7 summarizes the variability of all molecules present in the DUD–E dataset, taking into account the molecular weight. The results showed that from the total of 22,321 active ligands considered for all 101 DUD–E targets, 7990 have a molecular weight below 400 Da, while 8833 have a molecular weight in the range of 400–500 Da, with 5498 with a molecular weight over 500 Da. The distribution of decoys across these ranges was the same, as they were generated automatically from the known ligands included.
Table 5 decomposes the number of ligands identified in the top 1% of compounds ranked, according to the molecular weight. AutoDock identified a total of 1935 actives in the top 1% of ligands, while in Vina, this number was of 2002. The results showed that Vina was, on average, better than AutoDock in identifying actives in the top 1% of small ligands (<400 MW) (536 versus 395 actives) and for large-sized ligands (>500 MW) (581 versus 497 actives). However, AutoDock was able to rank more medium-sized actives (400–500 MW) among the top 1% of the results (1043 versus 885).
Regarding each family of proteins, all exhibited the same tendency—smaller ligands were more difficult to discriminate and appeared at worst ranking positions for both Vina and AutoDock.
Figure 8 shows the influence of molecular weight on the average ranking distribution of the molecules within the full-ranked list determined for each protein target. The results showed that there was a similar tendency for both GPCR and kinase protein families, where the smaller ligands were ranked worst and the medium ligands were ranked better. For both GPCRs and kinases, AutoDock could rank smaller ligands better than Vina, even though their ranking position was relatively high. As for the medium-sized active molecules (300–400), these two families exhibited opposite results—while Vina provided better recognition for kinases, AutoDock was more effective in discriminating actives and decoys for GPCRs.
3.1.2. Influence of the Number of Rotational Bonds
Figure 9 presents the relative distribution of all active ligands in the DUD–E dataset taking into consideration the number of rotational bonds present. There is a higher prevalence in molecules with 4 to 7, and 8 to 11 rotational bonds, representing 73% of the dataset. The remaining 27% corresponds to molecules with 0 to 3 and higher than 12 rotational bonds.
Ligands with more rotatable bonds presented a higher challenge for docking because they could adopt a larger number of possible conformations. Discriminating actives with many rotatable bonds from decoys with many rotatable bonds hence became more difficult, because correctly identifying the real pose of the ligand was more challenging. Hence, ligands with a higher number of rotational bonds were placed at the worst position in the database, when comparing with the ligands with fewer rotatable bonds. In this study, this was observed for all studied families.
In
Figure 10, the data for nuclear receptors and GPCRs are presented. For both families, AutoDock was able to rank more ligands early on. While in GPCRs there was a clear difference in the discrimination ability between Vina and AutoDock, for nuclear receptors, there was a similar behavior between both alternatives (exception—compounds with 4 rotatable bonds in nuclear receptors). According to our study, molecules with 5 to 10 rotational bonds ensured a better prediction with both AutoDock and Vina.