Next Article in Journal
A Selective G-Quadruplex DNA-Stabilizing Ligand Based on a Cyclic Naphthalene Diimide Derivative
Next Article in Special Issue
Three-Dimensional Compound Comparison Methods and Their Application in Drug Discovery
Previous Article in Journal
Green Chemistry Metrics with Special Reference to Green Analytical Chemistry
Article Menu

Export Article

Open AccessArticle
Molecules 2015, 20(6), 10947-10962;

Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest

Department of Computer Science and Engineering, Chinese University of Hong Kong, Sha Tin, New Territories 999077, Hong Kong
Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France
Author to whom correspondence should be addressed.
Academic Editor: Peter Willett
Received: 13 March 2015 / Revised: 4 June 2015 / Accepted: 9 June 2015 / Published: 12 June 2015
(This article belongs to the Special Issue Chemoinformatics)
Full-Text   |   PDF [962 KB, uploaded 12 June 2015]   |  


Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality. View Full-Text
Keywords: docking; binding affinity prediction; machine-learning scoring functions docking; binding affinity prediction; machine-learning scoring functions

Graphical abstract

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Li, H.; Leung, K.-S.; Wong, M.-H.; Ballester, P.J. Low-Quality Structural and Interaction Data Improves Binding Affinity Prediction via Random Forest. Molecules 2015, 20, 10947-10962.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Molecules EISSN 1420-3049 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top