Next Article in Journal
Unique Phase Transition of Exogenous Fusion Elastin-like Polypeptides in the Solution Containing Polyethylene Glycol
Next Article in Special Issue
Molecular Docking: Shifting Paradigms in Drug Discovery
Previous Article in Journal
Autophagy: New Insights into Mechanisms of Action and Resistance of Treatment in Acute Promyelocytic leukemia
Previous Article in Special Issue
Pathway-Centric Structure-Based Multi-Target Compound Screening for Anti-Virulence Drug Repurposing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Can We Still Trust Docking Results? An Extension of the Applicability of DockBench on PDBbind Database

Molecular Modeling Section, Department of Pharmaceutical and Pharmacological Sciences, University of Padova, 35131 Padova, Italy
*
Authors to whom correspondence should be addressed.
Int. J. Mol. Sci. 2019, 20(14), 3558; https://doi.org/10.3390/ijms20143558
Submission received: 17 June 2019 / Revised: 12 July 2019 / Accepted: 18 July 2019 / Published: 20 July 2019
(This article belongs to the Special Issue New Avenues in Molecular Docking for Drug Design)

Abstract

:
The number of entries in the Protein Data Bank (PDB) has doubled in the last decade, and it has increased tenfold in the last twenty years. The availability of an ever-growing number of structures is having a huge impact on the Structure-Based Drug Discovery (SBDD), allowing investigation of new targets and giving the possibility to have multiple structures of the same macromolecule in a complex with different ligands. Such a large resource often implies the choice of the most suitable complex for molecular docking calculation, and this task is complicated by the plethora of possible posing and scoring function algorithms available, which may influence the quality of the outcomes. Here, we report a large benchmark performed on the PDBbind database containing more than four thousand entries and seventeen popular docking protocols. We found that, even in protein families wherein docking protocols generally showed acceptable results, certain ligand-protein complexes are poorly reproduced in the self-docking procedure. Such a trend in certain protein families is more pronounced, and this underlines the importance in identification of a suitable protein–ligand conformation coupled to a well-performing docking protocol.

Graphical Abstract

1. Introduction

Since its introduction in the early 1980s [1], molecular docking has served to aid medicinal computational chemists in optimizing the drug discovery process. Ten years later, due to methodological and technological advances, together with the increasing number of experimentally solved macromolecular structures, it became possible to process more and more molecules within a docking procedure, opening the era of Structure-Based Virtual Screening (SBVS) as a strategy in selecting appropriate compounds from large virtual libraries on the basis of good protein–ligand interaction patterns [2]. Thanks to molecular docking, Structure-Based Drug Discovery (SBDD) field has become very popular today. A docking protocol can be described as the combination of a search algorithm that samples the conformational space of a ligand, generating conformations for the ligand itself (defined as poses) within a binding site, and a mathematical equation, called scoring function, which quantitatively evaluates the quality of such poses. The scoring function has always been the Achilles tendon of molecular docking due to the inaccuracy in quantified strength of the complex network of molecular interactions. Today, it is widely accepted that molecular docking has been outperformed by other structure-based in silico methodologies in investigating the stability and strength of the protein–ligand interaction [3], even though they are usually demanding techniques. However, molecular docking still represents a valid technique in sampling the conformations of the ligand in a binding site in a very efficient manner—at a fraction of the computational cost of more accurate methods based for example on Molecular Dynamics [4]. To prove the extensive adoption of molecular docking in research, there are more than 50 docking software options listed up to date in the on-line Click2Drug repository [5]. It should also be considered that each docking software usually provides more than one scoring function in which performance ought to be evaluated in the protocol tuning step. This means that computational chemists have at their disposal a plethora of different protocols when they face a docking calculation and, more importantly, the success, for example, of a Virtual Screening (VS) campaign, strongly relies on the accuracy of the protocol employed to place and rank the conformation of candidates into a target binding site [6]. To further complicate matters, additional considerations need to be taken into account. In fact, more and more experimental structures are thankfully available, hence the range of possible combinations in protein conformation-docking protocol is growing in an unstoppable trend. It is, therefore, clear that a crucial step in SBVS is the selection of a proper docking protocol and an appropriate protein conformation [7,8]. To address this issue, we recently proposed a platform, DockBench, with the aim of simplifying the non-trivial task of automatically comparing the performance of different docking protocols in a self-docking exercise. The criteria of selection of the most appropriate protocol are based on geometrical and statistical basis evaluating few observables: the lowest and the average Root Main Square Deviation (RMSD) obtained for a pose of the ligand compared to its crystallographic pose and the protocol score [9]. In 2011, Plewczynski et al. reported a comparison among seven docking protocols on the PDBbind (http://www.pdbbind.org.cn) that, at that time, counted on 1300 structures [8]. Here, we report a large benchmark of 17 different docking protocols compared on the basis of the self-docking procedure on a dataset of 4169 protein–ligand complexes. The notable number of structures has offered the opportunity to evaluate the performance of molecular docking from different points of view, underlining how the efficiency of docking protocols may vary depending on the nature of the protein family.

2. Results

The benchmark was performed on 4169 structures obtained from PDBbind, a free database of binding affinity data for biomolecular complexes including protein–ligand, nucleic acid-ligand, protein-nucleic acid, and protein–protein complexes [10]. The PDBbind “Refined set” is a subset of high-quality protein–ligand complex structures helpful for the validation of Docking protocols. All the structure needs to be processed prior to the docking calculation to keep only the protein and the ligand alone. This was necessary to simplify the execution on such a large set of complexes and protocols.
The preparation of the data was accomplished by an automatic procedure based on the Molecular Operating Environment (MOE) suite for proteins and OpenEye toolkit for ligands (vide infra, see method section for details) [11,12]. The benchmark execution was performed on all 17 protocols implemented in DockBench 1.0.6 based on seven different docking software options, each of which was coupled to different scoring functions whenever possible. The complete list of the protocols is reported in Table 1. The benchmark consisted of the execution of 70,873 single docking runs (4169 complexes; 17 protocols) distributed on a single server. The wall time necessary to perform all docking runs was approximately 72 h.
The automated analysis was based on the calculation of three scores: (i) RMSD minimum (RMSDmin), (ii) the RMSD average (RMSDave), (ii) the number of structure with RMSD lower than the (N(RSMD < R)), and a fourth score named Protocol Score Pscore that summarized the overall performance for a geometric point of view. The Pscore instead is defined as follows: One point is assigned to the protocols that have an RMSDave lower than the value of the crystallographic resolution, another point is assigned to the protocols producing at least 10 poses (50% of generated conformation) with an RMSD (compared to the crystallographic geometry) lower than the crystallographic resolution, and two points are assigned to protocols which fulfill both the previous conditions. The complete matrix of the results is available in supporting information. The observed RMSDmin values were in the range of 0.05 and 38.49 Å. High RMSDmin values are symptomatic for ligands placed far away from the native binding site. A possible explanation could be ascribed in having defined the pocket using a sphere with radius 15 Å. The radius was deliberately set large to give the possibility to be sufficiently broad for all the ligands in the dataset and may be problematic for docking of small ligands or in the case of multiple pockets closely located.
An interesting question we were considered was about the performance of docking protocols in different target families since, in PDBbind, many protein families are represented by several entries. The results were grouped on the basis of the protein in families (PF) using the Pfam (Protein Family) database families as definition [13]. For each complex, the PF Pfam code was retrieved for the protein chain and hence grouped. For many multi-domain proteins, a different Pfam code can be assigned depending on the domain solved in the structure; for instance, the proteins belonging to the family PF00069 (Protein Kinase) often contain domains labeled as PF02827 (Cyclic adenosine monophospate-dependent protein kinase inhibitor), PF00134 (Cyclin, N-terminal domain), PF02984 (Cyclin, C-terminal domain), and a few others. Some proteins cannot be classified in a single group, and therefore we merged those groups for analysis (for example, PF00183 and PF02518, Heat Shock Protein 90, HSP90 and GHKL domain). To address this issue, we compared the docking performance by the Protocol Score (Pscore) for the major cluster to investigate whether the docking performances of the different protocols vary among the different protein families.
Unexpectedly, the performance among different families showed a remarkable fluctuation (Table 2), with certain families having many protocols with Pscore > 1 on most of the complexes. It is interesting to note that, between the best performing group (PF00104) and the worst (PF00026) one, the percentage of protocols with Pscore > 1 showed a difference of an order of magnitude, 41.66%, and 4.37%, respectively. Among the best-performing ones, the families with good Pscore were: PF00104 (Hormone receptors), PF00497 (Bacterial extracellular solute-binding proteins, family 3), PF10613 (Ligated ion channel l-glutamate and glycine binding site), and PF01048 (Phosphorylase superfamily). All these families showed a Pscore > 1 in more than 29% of the docking runs.
On the other hand, we found that certain families had very poor results, with Pscore > 1 found below 10%; this is the case for PF00194 (Eukaryotic-type carbonic anhydrase), PF00077 (Retroviral aspartyl proteases), PF00413 (Matrixin), and PF00026 (Eukaryotic aspartyl protease). The trend observed for Pscore is also evident in RMSDave.
The results for the most populated families are reported in Figure 1. The Pscores were reported as a heatmap to easily summarize the comparison of such a big matrix (higher scores highlight better protocol-complex couple). Numerical results are reported in the supplementary information. The results for the same families in terms of RMSDave are reported in Figure 2.
A further aspect that was considered was the ability of the docking protocol in placing in the first position, according to their scoring function, the pose with the lowest RMSD. This aspect is particularly relevant because it indicates how the protocol is able to distinguish between different binding modes and, hopefully, prioritizing a binding mode close to the experimentally observed. In Figures S1 and S2, the heatmap plots reporting for the docking runs in which the best-scored pose is also the conformation with lowest RMSD. Unfortunately, in several cases, this simultaneous occurrence did not always guarantee the identification od near-native pose. Indeed, we observed for several cases where the lowest RMSD conformation was far from the experimentally solved one with RMSD values reaching values bigger than 10 Å. The RMSD value of the best conformations is reported on the heatmaps in Figures S3 and S4. Therefore, we performed further analysis focusing on investigation of when the best pose also had a low RMSD value but not necessarily the lowest values. We decided to set a threshold of 1.5 Å to define a near-native pose. In this way, we could highlight a protocol able to place a “good” pose as the first solution, even if potentially better conformation could be present among the 20 obtained. In Figures S5 and S6, the runs that fulfill such concurrence are reported. Again, the performance of docking protocols showed a very different performance depending on the protein family and, interestingly, in agreement with the Pscore trends. The Ligand-binding domain of nuclear hormone receptor (PF00194) showed in 50% of the runs RMSD < 1.5 Å for the first pose. The percentage of success is also remarkable for the Ligated ion channel l-glutamate- and glycine-binding site (PF10613), 49.3%; the Bacterial extracellular solute-binding proteins (PF00497), 47.6%; and Phosphorylase superfamily (PF01048), 41.7%. On the contrary, certain families performed poorly in this analysis, in particular, Eukaryotic-type carbonic anhydrase, which showed only a 10.8% (Table S1, on Supporting Material).
The factors that are so dramatically affecting the quality of the docking outputs among different families could be related to many variables. First, we address the possible different chemical natures of the ligands belonging to each protein family. To evaluate the ligand chemical space, several molecular descriptors were calculated, including weight, rotatable bonds, hydrogen bond acceptors, hydrogen bond donors, clogP, total polar surface area, and van der Waals volume. To reduce the number of the dimensions, and therefore make the distribution representable in a three-dimensional plot, a Principal Component Analysis (PCA) was performed. As can be seen in Figure 3, ligands of the different clusters do not seem to occupy a different portion of the chemical space. Hence, we then moved attention to possible players removed during the complex preparation, considering that the poor performances of docking in the cluster PF00077 (Retroviral aspartyl proteases) and PF00439 (Bromodomain) could be eventually ascribed to the removal of the crystallographic waters. It was already reported that the binding mode for several ligands is mediated by a series of water molecules for bromodomains [14].
Similarly, in the performances observed for cluster PF00194 (Carbonic Anhydrases), a crucial aspect could be represented by the removal of the zinc ion from the binding sites.
For this reason, we performed a further benchmark focused on this family, including the Zinc ion, employing the most promising protocols in the first benchmark, Plants- and Gold-based protocols. The comparison of the heatmaps of the Pscore reported in Figure 4 demonstrates that, despite the introduction of the Zinc ion, the trend of the Pscore improves only moderately. Surprisingly, the distribution of the high Pscore is different in the two benchmarks, suggesting that the Zinc ion introduction only improves for certain complex structures while getting worse for others.

3. Discussion

A computational chemist has to ask himself many of the right questions when facing molecular docking studies, and the answers are not univocal. Of course, the choice of the best performing protocol and, when multiple structures are available, of the target conformation is the most significant decision. However, the employment of molecular docking may have a different purpose, and a proficient protocol choice must consider such different use. If molecular docking is addressed in binding mode studies, the protocol performances should have the priority. At the same time, the choice of the protein target should depend on the similarity between the compounds to be studied and the ligand co-crystalized. When molecular docking is used in a VS campaign, more variables affect the selection, like the execution speed. The results obtained in this benchmark were obtained with parameter as close as possible to the default values resulting in very variable execution times. For instance, as already reported in previous Dockbench studies, certain protocols may require an order of magnitude of longer time in comparison to faster protocols. It is evident in the case of large libraries that this may represent a critical issue, hence protocols with similar outcomes in self-docking procedure where the choice can be influenced by the execution speed. In our benchmark, we observed, for example, in certain families of proteins, several protocols showing good performance, hence protocol selection may depend on the other factor. It is interesting to note that in the protein families in which molecular docking shows a good trend in reproducing the experimental conformation, certain protein–ligand complexes are far from being predicted correctly, suggesting the importance of excluding them for docking simulations. Differently, other protein families are challenging targets in which the choice of the posing-scoring algorithm seems to be crucial, as well as the identification of the most suitable complex structure. The performance of such a challenging target should also point out the necessity to investigate the issues that are affecting the docking calculation, for instance, in considering the role of stable water molecules in the binding site or the role of a cofactor, flexible regions of the pocket, or other drawbacks of the system. This study may help the user approach a new target by molecular docking in identifying promising protocols and excluding problematic complex structures. In our opinion, the assessment of the suitable procedure should become a good practice also in light of the increasing number of entries available in the PDB and the advent of novel techniques like Cryo-EM and Solid-State NMR are wading the landscape of an experimentally solved target.

4. Materials and Methods

4.1. Database Preparation

The Refined-set of the PDBbind database was obtained from PDBbind web service (http://www.pdbbind.org.cn/) [10]. This dataset is composed of 4463 protein–ligand complexes, and 4169 of them were used for this work. We excluded 294 structures containing peptide–protein complexes that are not particularly suitable for DockBench protocol since it used docking settings which were as close as possible to the default parameters provided by the developers of each software and mostly calibrated on small organic molecules typical of drug discovery.
These 4169 complexes were prepared as described below.
The protein structures have been prepared using a Scientific Vector Language (SVL) script using the functions contained in MOE suite [11] reproducing the protein preparation tool of MOE to fix crystal structures issues, such as prediction of coordinates of missing atoms of partially solved residues. Co-crystallized solvent molecules and impurities (such as co-solvents) were removed, and only protein and ligand coordinates were retained. For all ligands, the most favorable ionic state was calculated with OpenEye tools fixpKa [12]. The partial charges were assigned with molcharge, also part of OpenEye toolkit [12]. Ligand geometries were minimized in the first step of DockBench with Openbabel routing using the MMFF94 force field [15].

4.2. Benchmark: Software and Hardware

The benchmark was performed with DockBench 1.06 software [16,17], running on a single HP ProLiant server DL585G7, equipped with four AMD Opteron Processor 6282 servers, for a total of 64 CPU cores. Docking protocol was executed according to the original implementation already reported [16]. All the 17 protocols from seven different software options (AutoDock 4.2.5.1 [18], Vina 1.1.2 [19], PLANTS 1.2 [20], rDOCK [21], Glide 6.5 [22], Gold 5.4.1 [23,24], and MOE 2019.01 [11]) were included in the benchmark and run on all 4169 protein–ligand complexes. Briefly, 20 poses were generated every single run. The binding site was defined using a sphere having a radius of 15 Å centered on the center of mass of the co-crystalized ligand present in the complex. An RMSD threshold set to a value of 1 Å value to define unique poses.
The analysis was performed with DockBench analyzer coupled to external Python and Bash script to manage the notable amount of data and to produce the plots [25,26]. The Pfam Protein family was retrieved for each protein using the RCSD PDB REST API service [27], while the Pfam Clan was obtained from Pfam REST API service [13]. Molecular descriptors were calculated using MOE suite [11].

Supplementary Materials

The complete table of results is provided as excel file (SI_DockBenck_resuld.xls). Supplementary materials can be found at https://www.mdpi.com/1422-0067/20/14/3558/s1.

Author Contributions

Conceptualization, G.B. and A.C.; methodology, A.C., M.S.; validation, G.B., M.B.; writing—original draft preparation, G.B. and M.S.; writing—review and editing, S.M.; supervision, M.S., and S.M.

Funding

This research received no external funding.

Acknowledgments

MMS lab is very grateful to Chemical Computing Group, OpenEye, and Acellera for the scientific and technical partnership. MMS lab gratefully acknowledges the support of NVIDIA Corporation with the donation of the Titan V GPU, used for this research.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

SBStructure-Based
VSVirtual Screening
PFAM Protein Family
NMRNuclear Magnetic Resona
CryoEMCryo Electron Microscpy
MMFF94Merck Molecular Force Field
RMSDRoot Main Square Deviation

References

  1. Kuntz, I.D.; Blaney, J.M.; Oatley, S.J.; Langridge, R.; Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 1982, 161, 269–288. [Google Scholar] [CrossRef]
  2. Horvath, D. A virtual screening approach applied to the search for trypanothione reductase inhibitors. J. Med. Chem. 1997, 40, 2412–2423. [Google Scholar] [CrossRef] [PubMed]
  3. Mobley, D.L.; Dill, K.A. Binding of small-molecule ligands to proteins: “What you see” is not always “what you get”. Structure 2009, 17, 489–498. [Google Scholar] [CrossRef] [PubMed]
  4. Moro, S.; Sturlese, M.; Ciancetta, A.; Floris, M. In silico 3D modeling of binding activities. Methods Mol. Biol. 2016, 1425, 23–35. [Google Scholar] [PubMed]
  5. Directory of in Silico Drug Design Tools - Docking. Available online: http://www.click2drug.org/directory_Docking.html (accessed on 25 May 2016).
  6. Houston, D.R.; Walkinshaw, M.D. Consensus docking: improving the reliability of docking in a virtual screening context. J. Chem. Inf. Model. 2013, 53, 384–390. [Google Scholar] [CrossRef] [PubMed]
  7. Salmaso, V.; Sturlese, M.; Cuzzolin, A.; Moro, S. DockBench as docking selector tool: the lesson learned from D3R Grand Challenge 2015. J. Comput. Aided. Mol. Des. 2016, 30, 773–789. [Google Scholar] [CrossRef]
  8. Plewczynski, D.; Łaźniewski, M.; Augustyniak, R.; Ginalski, K. Can we trust docking results? Evaluation of seven commonly used programs on PDBbind database. J. Comput. Chem. 2011, 32, 742–755. [Google Scholar] [CrossRef]
  9. Ciancetta, A.; Cuzzolin, A.; Moro, S. Alternative quality assessment strategy to compare performances of GPCR-ligand docking protocols: the human adenosine A(2A) receptor as a case study. J. Chem. Inf. Model. 2014, 54, 2243–2254. [Google Scholar] [CrossRef] [PubMed]
  10. Wang, R.; Fang, X.; Lu, Y.; Yang, C.Y.; Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 2005, 48, 4111–4119. [Google Scholar] [CrossRef] [PubMed]
  11. Chemical Computing Group (CCG) Inc. Molecular Operating Environment (MOE); Chemical Computing Group: Montreal, QC, Canada, 2019. [Google Scholar]
  12. OpenEye Scientific Software Inc. OEChem; OpenEye Scientific Software Inc.: Santa Fe, NM, USA, 2016. [Google Scholar]
  13. El-Gebali, S.; Mistry, J.; Bateman, A.; Eddy, S.R.; Luciani, A.; Potter, S.C.; Qureshi, M.; Richardson, L.J.; Salazar, G.A.; Smart, A.; et al. The Pfam protein families database in 2019. Nucleic Acids Res. 2019, 47, D427–D432. [Google Scholar] [CrossRef]
  14. Shadrick, W.R.; Slavish, P.J.; Chai, S.C.; Waddell, B.; Connelly, M.; Low, J.A.; Tallant, C.; Young, B.M.; Bharatham, N.; Knapp, S.; et al. Exploiting a water network to achieve enthalpy-driven, bromodomain-selective BET inhibitors. Bioorg. Med. Chem. 2018, 26, 25–36. [Google Scholar] [CrossRef] [PubMed]
  15. Halgren, T.A. Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94. J. Comput. Chem. 1996, 17, 490–519. [Google Scholar] [CrossRef]
  16. Cuzzolin, A.; Sturlese, M.; Malvacio, I.; Ciancetta, A.; Moro, S. DockBench: An Integrated Informatic Platform Bridging the Gap between the Robust Validation of Docking Protocols and Virtual Screening Simulations. Molecules 2015, 20, 9977–9993. [Google Scholar] [CrossRef] [PubMed]
  17. Salmaso, V.; Sturlese, M.; Cuzzolin, A.; Moro, S. Combining self- and cross-docking as benchmark tools: the performance of DockBench in the D3R Grand Challenge 2. J. Comput. Aided Mol. Des. 2018, 32, 251–264. [Google Scholar] [CrossRef] [PubMed]
  18. Morris, G.M.; Huey, R.; Lindstrom, W.; Sanner, M.F.; Belew, R.K.; Goodsell, D.S.; Olson, A.J. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 2009, 30, 2785–2791. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Trott, O.; Olson, A.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 2010, 31, 455–461. [Google Scholar] [CrossRef] [PubMed]
  20. Korb, O.; Stützle, T.; Exner, T.E.; Dorigo, M.; Gambardella, L.M.; Birattari, M.; Martinoli, A.; Poli, R.; Hutchison, D.; Kanade, T.; et al. PLANTS: Application of Ant Colony Optimization to Structure-Based Drug Design. In Ant Colony Optimization and Swarm Intelligence; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4150, pp. 247–258. [Google Scholar]
  21. Ruiz-Carmona, S.; Alvarez-Garcia, D.; Foloppe, N.; Garmendia-Doval, A.B.; Juhos, S.; Schmidtke, P.; Barril, X.; Hubbard, R.E.; Morley, S.D. rDock: A fast, versatile and open source program for docking ligands to proteins and nucleic acids. PLoS Comput. Biol. 2014, 10, e1003571. [Google Scholar] [CrossRef]
  22. Halgren, T.A.; Murphy, R.B.; Friesner, R.A.; Beard, H.S.; Frye, L.L.; Pollard, W.T.; Banks, J.L. Glide: A new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 2004, 47, 1750–1759. [Google Scholar] [CrossRef]
  23. Verdonk, M.L.; Cole, J.C.; Hartshorn, M.J.; Murray, C.W.; Taylor, R.D. Improved protein-ligand docking using GOLD. Proteins 2003, 52, 609–623. [Google Scholar] [CrossRef]
  24. Cambridge Crystallographic Data Centre. GOLD Suite, version 5.2; Cambridge Crystallographic Data Centre: Cambridge, UK, 2013. [Google Scholar]
  25. Hunter, J.D. Matplotlib: A 2D Graphics Environment. Comput Sci Eng 2007, 9, 90–95. [Google Scholar] [CrossRef]
  26. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 285–2830. [Google Scholar]
  27. Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The protein data bank. Nucleic. Acids Res. 2000, 28, 235–242. [Google Scholar] [CrossRef]
Figure 1. DockBench Results divided by Pfam protein families. The heatmaps are color-coded according to the Pscore. The ten families in panel (a) are: PF00439, Bromodomain; PF10613, Ligated ion channel l-glutamate and glycine-binding site; PF00102, Protein tyrosine phosphatases; PF000061 Lipocalin; PF00497, Bacterial extracellular solute-binding proteins family 3; PF00104, Hormone receptors; PF00026, Eukaryotic aspartyl protease Peptidase M_10; PF01048, Phosphorylase superfamily; PF00233, 3′5′-cyclic nucleotide phosphodiesterases. The six families in panel (b) are: PF00089 Trypsin, PF14670 Coagulation Factor Xa inhibitory site, PF09396 Thrombin light chain, PF00077 Retroviral aspartyl proteases, PF00194 carbonic anhydrases, PF00069 protein kinase, PF07714 tyrosine kinase, PF02518 GHKL domain, and PF00183 HSP90.
Figure 1. DockBench Results divided by Pfam protein families. The heatmaps are color-coded according to the Pscore. The ten families in panel (a) are: PF00439, Bromodomain; PF10613, Ligated ion channel l-glutamate and glycine-binding site; PF00102, Protein tyrosine phosphatases; PF000061 Lipocalin; PF00497, Bacterial extracellular solute-binding proteins family 3; PF00104, Hormone receptors; PF00026, Eukaryotic aspartyl protease Peptidase M_10; PF01048, Phosphorylase superfamily; PF00233, 3′5′-cyclic nucleotide phosphodiesterases. The six families in panel (b) are: PF00089 Trypsin, PF14670 Coagulation Factor Xa inhibitory site, PF09396 Thrombin light chain, PF00077 Retroviral aspartyl proteases, PF00194 carbonic anhydrases, PF00069 protein kinase, PF07714 tyrosine kinase, PF02518 GHKL domain, and PF00183 HSP90.
Ijms 20 03558 g001aIjms 20 03558 g001b
Figure 2. DockBench Results divided by Pfam protein families. The heatmaps are color-coded according to the RMSDave. The ten families in panel (a) are: PF00439, Bromodomain; PF10613, Ligated ion channel l-glutamate and glycine-binding site; PF00102, Protein tyrosine phosphatases; PF000061 Lipocalin; PF00497, Bacterial extracellular solute-binding proteins family 3; PF00104, Hormone receptors; PF00026, Eukaryotic aspartyl protease Peptidase M_10; PF01048, Phosphorylase superfamily; and PF00233, 3′5′-cyclic nucleotide phosphodiesterases. In panel (b) the heatmaps are color-coded according to the Root Main Square Deviation (RMSD)ave. The six families are: PF00089 Trypsin, PF14670 Coagulation Factor Xa inhibitory site, PF09396 Thrombin light chain, PF00077 Retroviral aspartyl proteases, PF00194 carbonic anhydrases, PF00069 protein kinase, PF07714 tyrosine kinase, PF02518 GHKL domain, and PF00183 HSP90.
Figure 2. DockBench Results divided by Pfam protein families. The heatmaps are color-coded according to the RMSDave. The ten families in panel (a) are: PF00439, Bromodomain; PF10613, Ligated ion channel l-glutamate and glycine-binding site; PF00102, Protein tyrosine phosphatases; PF000061 Lipocalin; PF00497, Bacterial extracellular solute-binding proteins family 3; PF00104, Hormone receptors; PF00026, Eukaryotic aspartyl protease Peptidase M_10; PF01048, Phosphorylase superfamily; and PF00233, 3′5′-cyclic nucleotide phosphodiesterases. In panel (b) the heatmaps are color-coded according to the Root Main Square Deviation (RMSD)ave. The six families are: PF00089 Trypsin, PF14670 Coagulation Factor Xa inhibitory site, PF09396 Thrombin light chain, PF00077 Retroviral aspartyl proteases, PF00194 carbonic anhydrases, PF00069 protein kinase, PF07714 tyrosine kinase, PF02518 GHKL domain, and PF00183 HSP90.
Ijms 20 03558 g002aIjms 20 03558 g002b
Figure 3. Principal Component Analysis (PCA) analysis seven molecular descriptors for the groups of ligands on the base of protein families in Table 2. The PCA analysis of ligands from the protein families were split into two groups according to the same division on Figure 1b (a) and Figure 2b (b). The descriptors used in the analysis are weight, rotatable bonds, hydrogen bond acceptor, hydrogen bond donor, clogP, total polar surface area, and van der Waals volume.
Figure 3. Principal Component Analysis (PCA) analysis seven molecular descriptors for the groups of ligands on the base of protein families in Table 2. The PCA analysis of ligands from the protein families were split into two groups according to the same division on Figure 1b (a) and Figure 2b (b). The descriptors used in the analysis are weight, rotatable bonds, hydrogen bond acceptor, hydrogen bond donor, clogP, total polar surface area, and van der Waals volume.
Ijms 20 03558 g003
Figure 4. Comparison between DockBench Results in terms of Protocol Score for cluster PF00194 (carbonic anhydrases) with (b) and without (a) the Zinc ion.
Figure 4. Comparison between DockBench Results in terms of Protocol Score for cluster PF00194 (carbonic anhydrases) with (b) and without (a) the Zinc ion.
Ijms 20 03558 g004
Table 1. List of docking protocols used in the benchmark.
Table 1. List of docking protocols used in the benchmark.
ProgramSearch Algorithm/Placing MethodScoring FunctionProtocol Abbreviation
Autodock 4.2Local SearchAutoDock SFAUTODOCK-ls
Lamarckian GAAutoDock SFAUTODOCK-lga
Genetic AlgorithmAutoDock SFAUTODOCK-ga
Vina 1.1.2Monte Carlo + BFGS local searchStandard Vina SFVINA-std
Glide 6.5Glide AlgorithmStandard PrecisionGLIDE-sp
GOLD 5.4.1Genetic AlgorithmGoldscoreGOLD-goldscore
Genetic AlgorithmChemscoreGOLD-chemscore
Genetic AlgorithmASPGOLD-asp
Genetic AlgorithmPLPGOLD-plp
MOE 2019.01Triangle MatcherLondon-dGMOE-londondg
Triangle MatcherAffinity-dGMOE-affinitydg
Triangle MatcherGBIVIWSAMOE-gbiviwsa
PLANTS 1.2ACO AlgorithmPLPPLANTS-plp
ACO AlgorithmPLP95PLANTS-plp95
ACO AlgorithmChemPLPPLANTS-chemplp
rDock 2013.1Genetic Algorithm + Monte Carlo + Simplex minimizationStandard rDock master SFRDOCK-std
Genetic Algorithm + Monte Carlo + Simplex minimizationStandard rDock master SF + desolvation potentialRDOCK-solv
GA (Genetic Algorithm) BFGS (Broyden-Fletcher-Goldfarb-Shanno), ASP (Astex Statistical Potential), PLP (pair wise linear potential), ACO (Ant Colony Optimization).
Table 2. Summary of benchmark results by Pfam families. Protocol scores are reported as percentage with respect to the total docking runs (Pscore%).
Table 2. Summary of benchmark results by Pfam families. Protocol scores are reported as percentage with respect to the total docking runs (Pscore%).
Pfam FamilyProtein DescriptionSizeProtocol Score Pscore%
0123>1
PF00104Ligand-binding domain of nuclear hormone receptor8559.3410.2426.574.8441.66
PF00497Bacterial extracellular solute-binding proteins, family 33859.299.4425.705.5740.71
PF10613Ligated ion channel l-glutamate- and glycine-binding site8367.976.8020.554.6832.03
PF01048Phosphorylase superfamily4770.098.0116.775.1329.91
PF00102Protein-tyrosine phosphatase5279.305.2011.993.5120.70
PF00069Protein kinase domain20780.685.4310.463.4319.32
PF00061Lipocalin/cytosolic fatty-acid binding protein family4982.114.0810.803.0017.88
PF02518
PF00183
Hsp90 protein and GHKL domain8982.745.358.263.6417.25
PF07714Protein tyrosine kinase13383.905.796.773.5416.10
PF00089
PF14670
PF09396
Trypsin33085.544.656.842.9614.45
PF002333′5′-cyclic nucleotide phosphodiesterase3787.923.825.412.8612,08
PF00439Bromodomain11290.022.894.672.429.98
PF00026Eukaryotic aspartyl protease7390.493.144.112.269.51
PF00413Matrixin4990.883.244.201.689.12
PF00077Retroviral aspartyl protease30195.412.271.640.684.59
PF00194Eukaryotic-type carbonic anhydrase27395.632.281.170.914.37
Pfam (Protein Family), Hsp90 (Heat shock protein 90).

Share and Cite

MDPI and ACS Style

Bolcato, G.; Cuzzolin, A.; Bissaro, M.; Moro, S.; Sturlese, M. Can We Still Trust Docking Results? An Extension of the Applicability of DockBench on PDBbind Database. Int. J. Mol. Sci. 2019, 20, 3558. https://doi.org/10.3390/ijms20143558

AMA Style

Bolcato G, Cuzzolin A, Bissaro M, Moro S, Sturlese M. Can We Still Trust Docking Results? An Extension of the Applicability of DockBench on PDBbind Database. International Journal of Molecular Sciences. 2019; 20(14):3558. https://doi.org/10.3390/ijms20143558

Chicago/Turabian Style

Bolcato, Giovanni, Alberto Cuzzolin, Maicol Bissaro, Stefano Moro, and Mattia Sturlese. 2019. "Can We Still Trust Docking Results? An Extension of the Applicability of DockBench on PDBbind Database" International Journal of Molecular Sciences 20, no. 14: 3558. https://doi.org/10.3390/ijms20143558

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop