Next Article in Journal
In Vitro Antiviral Activity of Tyrosinase from Mushroom Agaricus bisporus against Hepatitis C Virus
Next Article in Special Issue
Predicting the Skin Sensitization Potential of Small Molecules with Machine Learning Models Trained on Biologically Meaningful Descriptors
Previous Article in Journal
Fluorescence Cross-Correlation Spectroscopy Yields True Affinity and Binding Kinetics of Plasmodium Lactate Transport Inhibitors
Previous Article in Special Issue
In Silico Approaches: A Way to Unveil Novel Therapeutic Drugs for Cervical Cancer Management
Article

Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints

1
Know-Center, Inffeldgasse 13, 8010 Graz, Austria
2
Centre for Applied Bioanthropology, Institute for Anthropological Research, 10000 Zagreb, Croatia
3
Institute of Interactive Systems and Data Science, Graz University of Technology, Inffeldgasse 16C, 8010 Graz, Austria
4
Copenhagen Studies on Asthma in Childhood, Herlev-Gentofte Hospital, University of Copenhagen, Ledreborg Alle 34, 2820 Gentofte, Denmark
5
Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg, Denmark
*
Author to whom correspondence should be addressed.
Academic Editor: Osvaldo Andrade Santos-Filho
Pharmaceuticals 2021, 14(8), 758; https://doi.org/10.3390/ph14080758
Received: 30 June 2021 / Revised: 21 July 2021 / Accepted: 22 July 2021 / Published: 2 August 2021
(This article belongs to the Special Issue In Silico Approaches in Drug Design)
Methods for dimensionality reduction are showing significant contributions to knowledge generation in high-dimensional modeling scenarios throughout many disciplines. By achieving a lower dimensional representation (also called embedding), fewer computing resources are needed in downstream machine learning tasks, thus leading to a faster training time, lower complexity, and statistical flexibility. In this work, we investigate the utility of three prominent unsupervised embedding techniques (principal component analysis—PCA, uniform manifold approximation and projection—UMAP, and variational autoencoders—VAEs) for solving classification tasks in the domain of toxicology. To this end, we compare these embedding techniques against a set of molecular fingerprint-based models that do not utilize additional pre-preprocessing of features. Inspired by the success of transfer learning in several fields, we further study the performance of embedders when trained on an external dataset of chemical compounds. To gain a better understanding of their characteristics, we evaluate the embedders with different embedding dimensionalities, and with different sizes of the external dataset. Our findings show that the recently popularized UMAP approach can be utilized alongside known techniques such as PCA and VAE as a pre-compression technique in the toxicology domain. Nevertheless, the generative model of VAE shows an advantage in pre-compressing the data with respect to classification accuracy. View Full-Text
Keywords: manifold learning; machine learning; rdkit; embeddings; Tox21; principal component analysis; autoencoder manifold learning; machine learning; rdkit; embeddings; Tox21; principal component analysis; autoencoder
Show Figures

Figure 1

MDPI and ACS Style

Lovrić, M.; Đuričić, T.; Tran, H.T.N.; Hussain, H.; Lacić, E.; Rasmussen, M.A.; Kern, R. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals 2021, 14, 758. https://doi.org/10.3390/ph14080758

AMA Style

Lovrić M, Đuričić T, Tran HTN, Hussain H, Lacić E, Rasmussen MA, Kern R. Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints. Pharmaceuticals. 2021; 14(8):758. https://doi.org/10.3390/ph14080758

Chicago/Turabian Style

Lovrić, Mario, Tomislav Đuričić, Han T.N. Tran, Hussain Hussain, Emanuel Lacić, Morten A. Rasmussen, and Roman Kern. 2021. "Should We Embed in Chemistry? A Comparison of Unsupervised Transfer Learning with PCA, UMAP, and VAE on Molecular Fingerprints" Pharmaceuticals 14, no. 8: 758. https://doi.org/10.3390/ph14080758

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop