Next Article in Journal
Quantitative Detection of Chromium Pollution in Biochar Based on Matrix Effect Classification Regression Model
Previous Article in Journal
Insights into Terminal Sterilization Processes of Nanoparticles for Biomedical Applications
Article

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

1
Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India
2
Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St., Liverpool L69 7ZB, UK
3
Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
4
Mellizyme Ltd., Liverpool Science Park, IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
*
Author to whom correspondence should be addressed.
Academic Editor: Amarda Shehu
Molecules 2021, 26(7), 2065; https://doi.org/10.3390/molecules26072065
Received: 1 March 2021 / Revised: 29 March 2021 / Accepted: 1 April 2021 / Published: 3 April 2021
(This article belongs to the Section Chemical Biology)
The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing “similar” molecules to cluster together in an effective and interpretable way. View Full-Text
Keywords: deep learning; artificial intelligence; generative methods; chemical space; neural networks; transformers; attention; cheminformatics deep learning; artificial intelligence; generative methods; chemical space; neural networks; transformers; attention; cheminformatics
Show Figures

Figure 1

MDPI and ACS Style

Shrivastava, A.D.; Kell, D.B. FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space. Molecules 2021, 26, 2065. https://doi.org/10.3390/molecules26072065

AMA Style

Shrivastava AD, Kell DB. FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space. Molecules. 2021; 26(7):2065. https://doi.org/10.3390/molecules26072065

Chicago/Turabian Style

Shrivastava, Aditya D., and Douglas B. Kell. 2021. "FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space" Molecules 26, no. 7: 2065. https://doi.org/10.3390/molecules26072065

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop