Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions
AbstractOptimal descriptors calculated with the simplified molecular input line entry system (SMILES) have been utilized in modeling of carcinogenicity as continuous values (logTD50). These descriptors can be calculated using correlation weights of SMILES attributes calculated by the Monte Carlo method. A considerable subset of these attributes includes rare attributes. The use of these rare attributes can lead to overtraining. One can avoid the influence of the rare attributes if their correlation weights are fixed to zero. A function, limS, has been defined to identify rare attributes. The limS defines the minimum number of occurrences in the set of structures of the training (subtraining) set, to accept attributes as usable. If an attribute is present less than limS, it is considered “rare”, and thus not used. Two systems of building up models were examined: 1. classic training-test system; 2. balance of correlations for the subtraining and calibration sets (together, they are the original training set: the function of the calibration set is imitation of a preliminary test set). Three random splits into subtraining, calibration, and test sets were analysed. Comparison of abovementioned systems has shown that balance of correlations gives more robust prediction of the carcinogenicity for all three splits (split 1: rtest2=0.7514, stest=0.684; split 2: rtest2=0.7998, stest=0.600; split 3: rtest2=0.7192, stest=0.728). View Full-Text
- Supplementary File 1:
Supplementary file: IJMS doi:10.3390/ijms10073106 (PDF, 92 KB)
Share & Cite This Article
Toropov, A.A.; Toropova, A.P.; Benfenati, E. Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions. Int. J. Mol. Sci. 2009, 10, 3106-3127.
Toropov AA, Toropova AP, Benfenati E. Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions. International Journal of Molecular Sciences. 2009; 10(7):3106-3127.Chicago/Turabian Style
Toropov, Andrey A.; Toropova, Alla P.; Benfenati, Emilio. 2009. "Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions." Int. J. Mol. Sci. 10, no. 7: 3106-3127.