Next Article in Journal
New Findings on LMO7 Transcripts, Proteins and Regulatory Regions in Human and Vertebrate Model Organisms and the Intracellular Distribution in Skeletal Muscle Cells
Next Article in Special Issue
Moving Average-Based Multitasking In Silico Classification Modeling: Where Do We Stand and What Is Next?
Previous Article in Journal
Sequence Does Not Matter: The Biomedical Applications of DNA-Based Coatings and Cores
Previous Article in Special Issue
Generation of Non-Nucleotide CD73 Inhibitors Using a Molecular Docking and 3D-QSAR Approach

Unsupervised Representation Learning for Proteochemometric Modeling

Bayer Machine Learning Research, Müllerstraße 178, 13353 Berlin, Germany
Authors to whom correspondence should be addressed.
Academic Editor: Bono Lučić
Int. J. Mol. Sci. 2021, 22(23), 12882;
Received: 15 October 2021 / Revised: 25 November 2021 / Accepted: 26 November 2021 / Published: 28 November 2021
In silico protein–ligand binding prediction is an ongoing area of research in computational chemistry and machine learning based drug discovery, as an accurate predictive model could greatly reduce the time and resources necessary for the detection and prioritization of possible drug candidates. Proteochemometric modeling (PCM) attempts to create an accurate model of the protein–ligand interaction space by combining explicit protein and ligand descriptors. This requires the creation of information-rich, uniform and computer interpretable representations of proteins and ligands. Previous studies in PCM modeling rely on pre-defined, handcrafted feature extraction methods, and many methods use protein descriptors that require alignment or are otherwise specific to a particular group of related proteins. However, recent advances in representation learning have shown that unsupervised machine learning can be used to generate embeddings that outperform complex, human-engineered representations. Several different embedding methods for proteins and molecules have been developed based on various language-modeling methods. Here, we demonstrate the utility of these unsupervised representations and compare three protein embeddings and two compound embeddings in a fair manner. We evaluate performance on various splits of a benchmark dataset, as well as on an internal dataset of protein–ligand binding activities and find that unsupervised-learned representations significantly outperform handcrafted representations. View Full-Text
Keywords: unsupervised representation learning; computational biology; protein–ligand binding prediction unsupervised representation learning; computational biology; protein–ligand binding prediction
Show Figures

Figure 1

MDPI and ACS Style

Kim, P.T.; Winter, R.; Clevert, D.-A. Unsupervised Representation Learning for Proteochemometric Modeling. Int. J. Mol. Sci. 2021, 22, 12882.

AMA Style

Kim PT, Winter R, Clevert D-A. Unsupervised Representation Learning for Proteochemometric Modeling. International Journal of Molecular Sciences. 2021; 22(23):12882.

Chicago/Turabian Style

Kim, Paul T., Robin Winter, and Djork-Arné Clevert. 2021. "Unsupervised Representation Learning for Proteochemometric Modeling" International Journal of Molecular Sciences 22, no. 23: 12882.

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Back to TopTop