Special Issue "Machine Learning for Molecular Modelling in Drug Design"

A special issue of Biomolecules (ISSN 2218-273X).

Deadline for manuscript submissions: closed (25 September 2018)

Special Issue Editor

Guest Editor
Dr. Pedro J. Ballester

INSERM, France
Website | E-Mail
Interests: computational drug design; cancer pharmacogenomic modelling; biomarker discovery; applied machine learning

Special Issue Information

Dear Colleagues,

Machine Learning (ML) has become a crucial component of early drug discovery. This research area has been fuelled by two main factors. The first is the fast-growing availability of relevant experimental data. Examples of such data are bioactivities between molecules of known chemical structure and non-molecular targets (cell lines, mice models, etc.), binding affinities of such molecules against macromolecular targets or X-ray crystal structures of proteins acting as drug targets. This trend has been catalysed by the development of community resources (e.g., ChEMBL, PubChem or PDB to name a few) that curate and facilitate re-using these data sets for predictive modelling. The second factor is the easy access to high-quality implementations in R or Python of a range of ML algorithms, along with the continuous introduction of new advances (e.g., XGBoost, deep learning or conformal prediction). As a result, an increasing number of data-driven ML models are being proposed and found advantageous in some way to identify new starting points for the drug discovery process.

We invite scientists working on this area to submit their original research or review articles for publication in this Special Issue. Topics of interest include (but are not limited to) docking, QSAR, target prediction, virtual screening or lead optimization. Both application and methodology research studies are welcome.

Dr. Pedro J. Ballester
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biomolecules is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 650 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Predictive modelling
  • Docking
  • QSAR
  • Virtual screening
  • Lead optimization
  • Target prediction
  • Drug design

Published Papers (4 papers)

View options order results:
result details:
Displaying articles 1-4
Export citation of selected articles as:

Research

Open AccessFeature PaperArticle Predicting Aromatic Amine Mutagenicity with Confidence: A Case Study Using Conformal Prediction
Biomolecules 2018, 8(3), 85; https://doi.org/10.3390/biom8030085
Received: 26 June 2018 / Revised: 16 August 2018 / Accepted: 21 August 2018 / Published: 29 August 2018
PDF Full-text (835 KB) | HTML Full-text | XML Full-text
Abstract
The occurrence of mutagenicity in primary aromatic amines has been investigated using conformal prediction. The results of the investigation show that it is possible to develop mathematically proven valid models using conformal prediction and that the existence of uncertain classes of prediction, such
[...] Read more.
The occurrence of mutagenicity in primary aromatic amines has been investigated using conformal prediction. The results of the investigation show that it is possible to develop mathematically proven valid models using conformal prediction and that the existence of uncertain classes of prediction, such as both (both classes assigned to a compound) and empty (no class assigned to a compound), provides the user with additional information on how to use, further develop, and possibly improve future models. The study also indicates that the use of different sets of fingerprints results in models, for which the ability to discriminate varies with respect to the set level of acceptable errors. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Figures

Figure 1

Open AccessFeature PaperArticle In Silico HCT116 Human Colon Cancer Cell-Based Models En Route to the Discovery of Lead-Like Anticancer Drugs
Biomolecules 2018, 8(3), 56; https://doi.org/10.3390/biom8030056
Received: 11 June 2018 / Revised: 10 July 2018 / Accepted: 11 July 2018 / Published: 17 July 2018
PDF Full-text (7587 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
To discover new inhibitors against the human colon carcinoma HCT116 cell line, two quantitative structure–activity relationship (QSAR) studies using molecular and nuclear magnetic resonance (NMR) descriptors were developed through exploration of machine learning techniques and using the value of half maximal inhibitory concentration
[...] Read more.
To discover new inhibitors against the human colon carcinoma HCT116 cell line, two quantitative structure–activity relationship (QSAR) studies using molecular and nuclear magnetic resonance (NMR) descriptors were developed through exploration of machine learning techniques and using the value of half maximal inhibitory concentration (IC50). In the first approach, A, regression models were developed using a total of 7339 molecules that were extracted from the ChEMBL and ZINC databases and recent literature. The performance of the regression models was successfully evaluated by internal and external validations, the best model achieved R2 of 0.75 and 0.73 and root mean square error (RMSE) of 0.66 and 0.69 for the training and test sets, respectively. With the inherent time-consuming efforts of working with natural products (NPs), we conceived a new NP drug hit discovery strategy that consists in frontloading samples with 1D NMR descriptors to predict compounds with anticancer activity prior to bioactivity screening for NPs discovery, approach B. The NMR QSAR classification models were built using 1D NMR data (1H and 13C) as descriptors, from 50 crude extracts, 55 fractions and five pure compounds obtained from actinobacteria isolated from marine sediments collected off the Madeira Archipelago. The overall predictability accuracies of the best model exceeded 63% for both training and test sets. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Figures

Graphical abstract

Open AccessArticle Pharmaceutical Machine Learning: Virtual High-Throughput Screens Identifying Promising and Economical Small Molecule Inhibitors of Complement Factor C1s
Biomolecules 2018, 8(2), 24; https://doi.org/10.3390/biom8020024
Received: 19 February 2018 / Revised: 26 April 2018 / Accepted: 27 April 2018 / Published: 7 May 2018
PDF Full-text (1227 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
When excessively activated, C1 is insufficiently regulated, which results in tissue damage. Such tissue damage causes the complement system to become further activated to remove the resulting tissue damage, and a vicious cycle of activation/tissue damage occurs. Current Food and Drug Administration approved
[...] Read more.
When excessively activated, C1 is insufficiently regulated, which results in tissue damage. Such tissue damage causes the complement system to become further activated to remove the resulting tissue damage, and a vicious cycle of activation/tissue damage occurs. Current Food and Drug Administration approved treatments include supplemental recombinant C1 inhibitor, but these are extremely costly and a more economical solution is desired. In our work, we have utilized an existing data set of 136 compounds that have been previously tested for activity against C1. Using these compounds and the activity data, we have created models using principal component analysis, genetic algorithm, and support vector machine approaches to characterize activity. The models were then utilized to virtually screen the 72 million compound PubChem repository. This first round of virtual high-throughput screening identified many economical and promising inhibitor candidates, a subset of which was tested to validate their biological activity. These results were used to retrain the models and rescreen PubChem in a second round vHTS. Hit rates for the first round vHTS were 57%, while hit rates for the second round vHTS were 50%. Additional structure–property analysis was performed on the active and inactive compounds to identify interesting scaffolds for further investigation. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Figures

Graphical abstract

Open AccessFeature PaperArticle The Impact of Protein Structure and Sequence Similarity on the Accuracy of Machine-Learning Scoring Functions for Binding Affinity Prediction
Biomolecules 2018, 8(1), 12; https://doi.org/10.3390/biom8010012
Received: 8 February 2018 / Revised: 9 March 2018 / Accepted: 12 March 2018 / Published: 14 March 2018
Cited by 1 | PDF Full-text (675 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets,
[...] Read more.
It has recently been claimed that the outstanding performance of machine-learning scoring functions (SFs) is exclusively due to the presence of training complexes with highly similar proteins to those in the test set. Here, we revisit this question using 24 similarity-based training sets, a widely used test set, and four SFs. Three of these SFs employ machine learning instead of the classical linear regression approach of the fourth SF (X-Score which has the best test set performance out of 16 classical SFs). We have found that random forest (RF)-based RF-Score-v3 outperforms X-Score even when 68% of the most similar proteins are removed from the training set. In addition, unlike X-Score, RF-Score-v3 is able to keep learning with an increasing training set size, becoming substantially more predictive than X-Score when the full 1105 complexes are used for training. These results show that machine-learning SFs owe a substantial part of their performance to training on complexes with dissimilar proteins to those in the test set, against what has been previously concluded using the same data. Given that a growing amount of structural and interaction data will be available from academic and industrial sources, this performance gap between machine-learning SFs and classical SFs is expected to enlarge in the future. Full article
(This article belongs to the Special Issue Machine Learning for Molecular Modelling in Drug Design)
Figures

Figure 1

Back to Top