Next Article in Journal
Density Functional Study of Structures and Electron Affinities of BrO4F/BrO4F-
Next Article in Special Issue
Prediction of Skin Sensitization with a Particle Swarm Optimized Support Vector Machine
Previous Article in Journal
Bacterial Stressors in Minimally Processed Food
Previous Article in Special Issue
QSPR Studies on Aqueous Solubilities of Drug-Like Compounds
Article Menu

Export Article

Open AccessArticle
Int. J. Mol. Sci. 2009, 10(7), 3106-3127; doi:10.3390/ijms10073106

Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions

1
Institute of Geology and Geophysics, 100041, Khodzhibaev St. 49, Tashkent, Uzbekistan
2
Istituto di Ricerche Farmacologiche Mario Negri, 20156, Via La Masa 19, Milano, Italy
*
Author to whom correspondence should be addressed.
Received: 14 May 2009 / Revised: 23 June 2009 / Accepted: 2 July 2009 / Published: 8 July 2009
(This article belongs to the Special Issue Recent Advances in QSAR/QSPR Theory)

Abstract

Optimal descriptors calculated with the simplified molecular input line entry system (SMILES) have been utilized in modeling of carcinogenicity as continuous values (logTD50). These descriptors can be calculated using correlation weights of SMILES attributes calculated by the Monte Carlo method. A considerable subset of these attributes includes rare attributes. The use of these rare attributes can lead to overtraining. One can avoid the influence of the rare attributes if their correlation weights are fixed to zero. A function, limS, has been defined to identify rare attributes. The limS defines the minimum number of occurrences in the set of structures of the training (subtraining) set, to accept attributes as usable. If an attribute is present less than limS, it is considered “rare”, and thus not used. Two systems of building up models were examined: 1. classic training-test system; 2. balance of correlations for the subtraining and calibration sets (together, they are the original training set: the function of the calibration set is imitation of a preliminary test set). Three random splits into subtraining, calibration, and test sets were analysed. Comparison of abovementioned systems has shown that balance of correlations gives more robust prediction of the carcinogenicity for all three splits (split 1: rtest2=0.7514, stest=0.684; split 2: rtest2=0.7998, stest=0.600; split 3: rtest2=0.7192, stest=0.728).
Keywords: QSAR; SMILES; optimal descriptor; carcinogenicity; balance of correlations; applicability domain QSAR; SMILES; optimal descriptor; carcinogenicity; balance of correlations; applicability domain
Figures

This is an open access article distributed under the Creative Commons Attribution License (CC BY 3.0).

Supplementary material

Scifeed alert for new publications

Never miss any articles matching your research from any publisher
  • Get alerts for new papers matching your research
  • Find out the new papers from selected authors
  • Updated daily for 49'000+ journals and 6000+ publishers
  • Define your Scifeed now

SciFeed Share & Cite This Article

MDPI and ACS Style

Toropov, A.A.; Toropova, A.P.; Benfenati, E. Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions. Int. J. Mol. Sci. 2009, 10, 3106-3127.

Show more citation formats Show less citations formats

Related Articles

Article Metrics

Article Access Statistics

1

Comments

[Return to top]
Int. J. Mol. Sci. EISSN 1422-0067 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top