Next Article in Journal
In Vitro and in Vivo Anticancer Activity of Aconitine on Melanoma Cell Line B16
Next Article in Special Issue
Identification of Electronic and Structural Descriptors of Adenosine Analogues Related to Inhibition of Leishmanial Glyceraldehyde-3-Phosphate Dehydrogenase
Previous Article in Journal
Synthesis and in Vitro Antiproliferative Activity of New Phenylaminoisoquinolinequinones against Cancer Cell Lines
Previous Article in Special Issue
QSPR Models for Predicting Log Pliver Values for Volatile Organic Compounds Combining Statistical Methods and Domain Knowledge
Molecules 2013, 18(1), 735-756; doi:10.3390/molecules18010735
Article

Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

, Jr., , , ,  and *
Received: 26 September 2012; in revised form: 11 October 2012 / Accepted: 17 December 2012 / Published: 8 January 2013
(This article belongs to the Special Issue QSAR and Its Applications)
Download PDF [268 KB, uploaded 18 June 2014]
Abstract: With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed.
Keywords: virtual screening; machine learning; quantitative structure-activity relations (QSAR); high-throughput screening (HTS); cheminformatics; PubChem; BCL virtual screening; machine learning; quantitative structure-activity relations (QSAR); high-throughput screening (HTS); cheminformatics; PubChem; BCL
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Export to BibTeX |
EndNote


MDPI and ACS Style

Butkiewicz, M.; Lowe, E.W., Jr.; Mueller, R.; Mendenhall, J.L.; Teixeira, P.L.; Weaver, C.D.; Meiler, J. Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database. Molecules 2013, 18, 735-756.

AMA Style

Butkiewicz M, Lowe EW, Jr, Mueller R, Mendenhall JL, Teixeira PL, Weaver CD, Meiler J. Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database. Molecules. 2013; 18(1):735-756.

Chicago/Turabian Style

Butkiewicz, Mariusz; Lowe, Edward W., Jr.; Mueller, Ralf; Mendenhall, Jeffrey L.; Teixeira, Pedro L.; Weaver, C. D.; Meiler, Jens. 2013. "Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database." Molecules 18, no. 1: 735-756.


Molecules EISSN 1420-3049 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert