Next Article in Journal
Amyloid Assembly Endows Gad m 1 with Biomineralization Properties
Next Article in Special Issue
Pharmaceutical Machine Learning: Virtual High-Throughput Screens Identifying Promising and Economical Small Molecule Inhibitors of Complement Factor C1s
Previous Article in Journal
DOT1L and H3K79 Methylation in Transcription and Genomic Stability
Article Menu
Issue 1 (March) cover image

Export Article

Open AccessFeature PaperArticle
Biomolecules 2018, 8(1), 12; https://doi.org/10.3390/biom8010012

The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction

1
SDIVF R&D Centre, Hong Kong Science Park, Sha Tin, New Territories, Hong Kong, China
2
Institute of Future Cities, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China
3
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China
4
School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China
5
School of Biomedical Sciences, The Chinese University of Hong Kong, Sha Tin, New Territories, Hong Kong, China
6
Cancer Research Center of Marseille, INSERM U1068, F-13009 Marseille, France
7
Institut Paoli-Calmettes, F-13009 Marseille, France
8
Aix-Marseille Université, F-13284 Marseille, France
9
CNRS UMR7258, F-13009 Marseille, France
*
Author to whom correspondence should be addressed.
Received: 8 February 2018 / Revised: 9 March 2018 / Accepted: 12 March 2018 / Published: 14 March 2018
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
View Full-Text   |   Download PDF [675 KB, uploaded 14 March 2018]   |  

Abstract

It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future. View Full-Text
Keywords: machine learning; scoring function; molecular docking; binding affinity prediction machine learning; scoring function; molecular docking; binding affinity prediction
Figures

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY 4.0).

Supplementary material

SciFeed

Share & Cite This Article

MDPI and ACS Style

Li, H.; Peng, J.; Leung, Y.; Leung, K.-S.; Wong, M.-H.; Lu, G.; Ballester, P.J. The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction. Biomolecules 2018, 8, 12.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Biomolecules EISSN 2218-273X Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top