molecules-logo

Journal Browser

Journal Browser

Special Issue "Advancing Cheminformatics—A Theme Issue in Honor of Professor Jürgen Bajorath"

A special issue of Molecules (ISSN 1420-3049). This special issue belongs to the section "Computational and Theoretical Chemistry".

Deadline for manuscript submissions: closed (30 September 2021) | Viewed by 15361

Special Issue Editor

Dr. Martin Vogt
E-Mail Website
Guest Editor
Department of Life Science Informatics, b-it Institute of the University of Bonn, Bonn, Germany
Interests: mathematics and computer science; cheminformatics; virtual screening methods; algorithmic methods; mathematical, statistical, and data mining approaches for chemoinformatic questions

Special Issue Information

Dear Colleagues,

In the last 15–20 years, Prof. Dr. Jürgen Bajorath has been one of the leading figures in cheminformatics and in chemical information sciences. With over 700 publications to his name, he has shaped the field in many ways. His research focuses on the development of methods for the analysis and prediction of bioactive molecules and their application in pharmaceutical research. His publications cover a wide range of areas, including the development of methods and algorithms for:

  • molecular similarity analysis and computer-based hit and lead identification;
  • analysis, characterization, and visualization of systematical structure–activity relationships;
  • analysis of lead optimization efforts;
  • analysis of big data in medicinal chemistry.

Furthermore, Prof. Bajorath has made substantial contributions in the application of machine learning, data mining, and visualization techniques to the field.

Since 2004, Prof. Bajorath has served as the chair of Life Science Informatics at the Bonn-Aachen International Centers for Information Technology, which is associated with the University of Bonn, and where 30 PhD students have graduated successfully under his supervision in the past 16 years.

In 2015 he received the Herman-Skolnik Award and in 2018 the National Award for Computers in Chemical and Pharmaceutical Research from the American Chemical Society. In 2016, he received the inaugural Fujita Award of the Hansch-Fujita Foundation.

Considering his outstanding accomplishments, we would like to dedicate a Special Issue in honor of Prof. Jürgen Bajorath with a collection of reviews and original articles from his research areas.

Dr. Martin Vogt
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Molecules is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2300 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • cheminformatics
  • molecular similarity
  • virtual screening
  • structure–activity relationship
  • lead optimization
  • big data
  • machine learning
  • data mining
  • networks
  • visualization

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research, Review

Editorial
Advancing Cheminformatics—A Theme Issue in Honor of Professor Jürgen Bajorath
Molecules 2022, 27(8), 2542; https://doi.org/10.3390/molecules27082542 - 14 Apr 2022
Viewed by 272
Abstract
While cheminformatics problems have been actively researched since the early 1960s, as witnessed by the QSAR approaches developed by Toshio Fujita and Corwin Hansch [...] Full article

Research

Jump to: Editorial, Review

Article
Set-Theoretic Formalism for Treating Ligand-Target Datasets
Molecules 2021, 26(24), 7419; https://doi.org/10.3390/molecules26247419 - 07 Dec 2021
Viewed by 639
Abstract
Data on ligand–target (LT) interactions has played a growing role in drug research for several decades. Even though the amount of data has grown significantly in size and coverage during this period, most datasets remain difficult to analyze because of their extreme sparsity, [...] Read more.
Data on ligand–target (LT) interactions has played a growing role in drug research for several decades. Even though the amount of data has grown significantly in size and coverage during this period, most datasets remain difficult to analyze because of their extreme sparsity, as there is no activity data whatsoever for many LT pairs. Even within clusters of data there tends to be a lack of data completeness, making the analysis of LT datasets problematic. The current effort extends earlier works on the development of set-theoretic formalisms for treating thresholded LT datasets. Unlike many approaches that do not address pairs of unknown interaction, the current work specifically takes account of their presence in addition to that of active and inactive pairs. Because a given LT pair can be in any one of three states, the binary logic of classical set-theoretic methods does not strictly apply. The current work develops a formalism, based on ternary set-theoretic relations, for treating thresholded LT datasets. It also describes an extension of the concept of data completeness, which is typically applied to sets of ligands and targets, to the local data completeness of individual ligands and targets. The set-theoretic formalism is applied to the analysis of simple and joint polypharmacologies based on LT activity profiles, and it is shown that null pairs provide a means for determining bounds to these values. The methodology is applied to a dataset of protein kinase inhibitors as an illustration of the method. Although not dealt with here, work is currently underway on a more refined treatment of activity values that is based on increasing the number of activity classes. Full article
Show Figures

Figure 1

Article
Greedy 3-Point Search (G3PS)—A Novel Algorithm for Pharmacophore Alignment
Molecules 2021, 26(23), 7201; https://doi.org/10.3390/molecules26237201 - 27 Nov 2021
Viewed by 548
Abstract
Chemical features of small molecules can be abstracted to 3D pharmacophore models, which are easy to generate, interpret, and adapt by medicinal chemists. Three-dimensional pharmacophores can be used to efficiently match and align molecules according to their chemical feature pattern, which facilitates the [...] Read more.
Chemical features of small molecules can be abstracted to 3D pharmacophore models, which are easy to generate, interpret, and adapt by medicinal chemists. Three-dimensional pharmacophores can be used to efficiently match and align molecules according to their chemical feature pattern, which facilitates the virtual screening of even large compound databases. Existing alignment methods, used in computational drug discovery and bio-activity prediction, are often not suitable for finding matches between pharmacophores accurately as they purely aim to minimize RMSD or maximize volume overlap, when the actual goal is to match as many features as possible within the positional tolerances of the pharmacophore features. As a consequence, the obtained alignment results are often suboptimal in terms of the number of geometrically matched feature pairs, which increases the false-negative rate, thus negatively affecting the outcome of virtual screening experiments. We addressed this issue by introducing a new alignment algorithm, Greedy 3-Point Search (G3PS), which aims at finding optimal alignments by using a matching-feature-pair maximizing search strategy while at the same time being faster than competing methods. Full article
Show Figures

Figure 1

Article
Evaluating High-Variance Leaves as Uncertainty Measure for Random Forest Regression
Molecules 2021, 26(21), 6514; https://doi.org/10.3390/molecules26216514 - 28 Oct 2021
Cited by 1 | Viewed by 763
Abstract
Uncertainty measures estimate the reliability of a predictive model. Especially in the field of molecular property prediction as part of drug design, model reliability is crucial. Besides other techniques, Random Forests have a long tradition in machine learning related to chemoinformatics and are [...] Read more.
Uncertainty measures estimate the reliability of a predictive model. Especially in the field of molecular property prediction as part of drug design, model reliability is crucial. Besides other techniques, Random Forests have a long tradition in machine learning related to chemoinformatics and are widely used. Random Forests consist of an ensemble of individual regression models, namely, decision trees and, therefore, provide an uncertainty measure already by construction. Regarding the disagreement of single-model predictions, a narrower distribution of predictions is interpreted as a higher reliability. The standard deviation of the decision tree ensemble predictions is the default uncertainty measure for Random Forests. Due to the increasing application of machine learning in drug design, there is a constant search for novel uncertainty measures that, ideally, outperform classical uncertainty criteria. When analyzing Random Forests, it appears obvious to consider the variance of the dependent variables within each terminal decision tree leaf to obtain predictive uncertainties. Hereby, predictions that arise from more leaves of high variance are considered less reliable. Expectedly, the number of such high-variance leaves yields a reasonable uncertainty measure. Depending on the dataset, it can also outperform ensemble uncertainties. However, small-scale comparisons, i.e., considering only a few datasets, are insufficient, since they are more prone to chance correlations. Therefore, large-scale estimations are required to make general claims about the performance of uncertainty measures. On several chemoinformatic regression datasets, high-variance leaves are compared to the standard deviation of ensemble predictions. It turns out that high-variance leaf uncertainty is meaningful, not superior to the default ensemble standard deviation. A brief possible explanation is offered. Full article
Show Figures

Figure 1

Article
Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks
Molecules 2021, 26(20), 6185; https://doi.org/10.3390/molecules26206185 - 13 Oct 2021
Cited by 1 | Viewed by 724
Abstract
The accurate prediction of molecular properties, such as lipophilicity and aqueous solubility, are of great importance and pose challenges in several stages of the drug discovery pipeline. Machine learning methods, such as graph-based neural networks (GNNs), have shown exceptionally good performance in predicting [...] Read more.
The accurate prediction of molecular properties, such as lipophilicity and aqueous solubility, are of great importance and pose challenges in several stages of the drug discovery pipeline. Machine learning methods, such as graph-based neural networks (GNNs), have shown exceptionally good performance in predicting these properties. In this work, we introduce a novel GNN architecture, called directed edge graph isomorphism network (D-GIN). It is composed of two distinct sub-architectures (D-MPNN, GIN) and achieves an improvement in accuracy over its sub-architectures employing various learning, and featurization strategies. We argue that combining models with different key aspects help make graph neural networks deeper and simultaneously increase their predictive power. Furthermore, we address current limitations in assessment of deep-learning models, namely, comparison of single training run performance metrics, and offer a more robust solution. Full article
Show Figures

Graphical abstract

Article
Rapid Identification of Potential Drug Candidates from Multi-Million Compounds’ Repositories. Combination of 2D Similarity Search with 3D Ligand/Structure Based Methods and In Vitro Screening
Molecules 2021, 26(18), 5593; https://doi.org/10.3390/molecules26185593 - 15 Sep 2021
Viewed by 806
Abstract
Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost-effective approach in early drug discovery. If structures of active compounds are available, rapid 2D similarity search can be performed on multimillion compounds’ databases. This approach can be [...] Read more.
Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost-effective approach in early drug discovery. If structures of active compounds are available, rapid 2D similarity search can be performed on multimillion compounds’ databases. This approach can be combined with physico-chemical parameter and diversity filtering, bioisosteric replacements, and fragment-based approaches for performing a first round biological screening. Our objectives were to investigate the combination of 2D similarity search with various 3D ligand and structure-based methods for hit expansion and validation, in order to increase the hit rate and novelty. In the present account, six case studies are described and the efficiency of mixing is evaluated. While sequentially combined 2D/3D similarity approach increases the hit rate significantly, sequential combination of 2D similarity with pharmacophore model or 3D docking enriched the resulting focused library with novel chemotypes. Parallel integrated approaches allowed the comparison of the various 2D and 3D methods and revealed that 2D similarity-based and 3D ligand and structure-based techniques are often complementary, and their combinations represent a powerful synergy. Finally, the lessons we learnt including the advantages and pitfalls of the described approaches are discussed. Full article
Show Figures

Graphical abstract

Article
Congenericity of Claimed Compounds in Patent Applications
Molecules 2021, 26(17), 5253; https://doi.org/10.3390/molecules26175253 - 30 Aug 2021
Cited by 1 | Viewed by 976
Abstract
A method is presented to analyze quantitatively the degree of congenericity of claimed compounds in patent applications. The approach successfully differentiates patents exemplified with highly congeneric compounds of a structurally compact and well defined chemical series from patents containing a more diverse set [...] Read more.
A method is presented to analyze quantitatively the degree of congenericity of claimed compounds in patent applications. The approach successfully differentiates patents exemplified with highly congeneric compounds of a structurally compact and well defined chemical series from patents containing a more diverse set of compounds around a more vaguely described patent claim. An application to 750 common patents available in SureChEMBL, SureChEMBLccs and ChEMBL is presented and the congenericity of patent compounds in those different sources discussed. Full article
Show Figures

Figure 1

Article
Chemoinformatics Analyses of Tau Ligands Reveal Key Molecular Requirements for the Identification of Potential Drug Candidates against Tauopathies
Molecules 2021, 26(16), 5039; https://doi.org/10.3390/molecules26165039 - 20 Aug 2021
Viewed by 797
Abstract
Tau is a highly soluble protein mainly localized at a cytoplasmic level in the neuronal cells, which plays a crucial role in the regulation of microtubule dynamic stability. Recent studies have demonstrated that several factors, such as hyperphosphorylation or alterations of Tau metabolism, [...] Read more.
Tau is a highly soluble protein mainly localized at a cytoplasmic level in the neuronal cells, which plays a crucial role in the regulation of microtubule dynamic stability. Recent studies have demonstrated that several factors, such as hyperphosphorylation or alterations of Tau metabolism, may contribute to the pathological accumulation of protein aggregates, which can result in neuronal death and the onset of a number of neurological disorders called Tauopathies. At present, there are no available therapeutic remedies able to reduce Tau aggregation, nor are there any structural clues or guidelines for the rational identification of compounds preventing the accumulation of protein aggregates. To help identify the structural properties required for anti-Tau aggregation activity, we performed extensive chemoinformatics analyses on a dataset of Tau ligands reported in ChEMBL. The performed analyses allowed us to identify a set of molecular properties that are in common between known active ligands. Moreover, extensive analyses of the fragment composition of reported ligands led to the identification of chemical moieties and fragment combinations prevalent in the more active compounds. Interestingly, many of these fragments were arranged in recurring frameworks, some of which were clearly present in compounds currently under clinical investigation. This work represents the first in-depth chemoinformatics study of the molecular properties, constituting fragments and similarity profiles, of known Tau aggregation inhibitors. The datasets of compounds employed for the analyses, the identified molecular fragments and their combinations are made publicly available as supplementary material. Full article
Show Figures

Figure 1

Article
Interpretation of Ligand-Based Activity Cliff Prediction Models Using the Matched Molecular Pair Kernel
Molecules 2021, 26(16), 4916; https://doi.org/10.3390/molecules26164916 - 13 Aug 2021
Viewed by 1050
Abstract
Activity cliffs (ACs) are formed by two structurally similar compounds with a large difference in potency. Accurate AC prediction is expected to help researchers’ decisions in the early stages of drug discovery. Previously, predictive models based on matched molecular pair (MMP) cliffs have [...] Read more.
Activity cliffs (ACs) are formed by two structurally similar compounds with a large difference in potency. Accurate AC prediction is expected to help researchers’ decisions in the early stages of drug discovery. Previously, predictive models based on matched molecular pair (MMP) cliffs have been proposed. However, the proposed methods face a challenge of interpretability due to the black-box character of the predictive models. In this study, we developed interpretable MMP fingerprints and modified a model-specific interpretation approach for models based on a support vector machine (SVM) and MMP kernel. We compared important features highlighted by this SVM-based interpretation approach and the SHapley Additive exPlanations (SHAP) as a major model-independent approach. The model-specific approach could capture the difference between AC and non-AC, while SHAP assigned high weights to the features not present in the test instances. For specific MMPs, the feature weights mapped by the SVM-based interpretation method were in agreement with the previously confirmed binding knowledge from X-ray co-crystal structures, indicating that this method is able to interpret the AC prediction model in a chemically intuitive manner. Full article
Show Figures

Figure 1

Article
CYPstrate: A Set of Machine Learning Models for the Accurate Classification of Cytochrome P450 Enzyme Substrates and Non-Substrates
Molecules 2021, 26(15), 4678; https://doi.org/10.3390/molecules26154678 - 02 Aug 2021
Cited by 1 | Viewed by 1067
Abstract
The interaction of small organic molecules such as drugs, agrochemicals, and cosmetics with cytochrome P450 enzymes (CYPs) can lead to substantial changes in the bioavailability of active substances and hence consequences with respect to pharmacological efficacy and toxicity. Therefore, efficient means of predicting [...] Read more.
The interaction of small organic molecules such as drugs, agrochemicals, and cosmetics with cytochrome P450 enzymes (CYPs) can lead to substantial changes in the bioavailability of active substances and hence consequences with respect to pharmacological efficacy and toxicity. Therefore, efficient means of predicting the interactions of small organic molecules with CYPs are of high importance to a host of different industries. In this work, we present a new set of machine learning models for the classification of xenobiotics into substrates and non-substrates of nine human CYP isozymes: CYPs 1A2, 2A6, 2B6, 2C8, 2C9, 2C19, 2D6, 2E1, and 3A4. The models are trained on an extended, high-quality collection of known substrates and non-substrates and have been subjected to thorough validation. Our results show that the models yield competitive performance and are favorable for the detection of CYP substrates. In particular, a new consensus model reached high performance, with Matthews correlation coefficients (MCCs) between 0.45 (CYP2C8) and 0.85 (CYP3A4), although at the cost of coverage. The best models presented in this work are accessible free of charge via the “CYPstrate” module of the New E-Resource for Drug Discovery (NERDD). Full article
Show Figures

Figure 1

Article
Prediction of Molecular Properties Using Molecular Topographic Map
Molecules 2021, 26(15), 4475; https://doi.org/10.3390/molecules26154475 - 24 Jul 2021
Cited by 1 | Viewed by 1024
Abstract
Prediction of molecular properties plays a critical role towards rational drug design. In this study, the Molecular Topographic Map (MTM) is proposed, which is a two-dimensional (2D) map that can be used to represent a molecule. An MTM is generated from the atomic [...] Read more.
Prediction of molecular properties plays a critical role towards rational drug design. In this study, the Molecular Topographic Map (MTM) is proposed, which is a two-dimensional (2D) map that can be used to represent a molecule. An MTM is generated from the atomic features set of a molecule using generative topographic mapping and is then used as input data for analyzing structure-property/activity relationships. In the visualization and classification of 20 amino acids, differences of the amino acids can be visually confirmed from and revealed by hierarchical clustering with a similarity matrix of their MTMs. The prediction of molecular properties was performed on the basis of convolutional neural networks using MTMs as input data. The performance of the predictive models using MTM was found to be equal to or better than that using Morgan fingerprint or MACCS keys. Furthermore, data augmentation of MTMs using mixup has improved the prediction performance. Since molecules converted to MTMs can be treated like 2D images, they can be easily used with existing neural networks for image recognition and related technologies. MTM can be effectively utilized to predict molecular properties of small molecules to aid drug discovery research. Full article
Show Figures

Figure 1

Article
DMSO Solubility Assessment for Fragment-Based Screening
Molecules 2021, 26(13), 3950; https://doi.org/10.3390/molecules26133950 - 28 Jun 2021
Cited by 2 | Viewed by 932
Abstract
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules (“fragments”) in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured [...] Read more.
In this paper, we report comprehensive experimental and chemoinformatics analyses of the solubility of small organic molecules (“fragments”) in dimethyl sulfoxide (DMSO) in the context of their ability to be tested in screening experiments. Here, DMSO solubility of 939 fragments has been measured experimentally using an NMR technique. A Support Vector Classification model was built on the obtained data using the ISIDA fragment descriptors. The analysis revealed 34 outliers: experimental issues were retrospectively identified for 28 of them. The updated model performs well in 5-fold cross-validation (balanced accuracy = 0.78). The datasets are available on the Zenodo platform (DOI:10.5281/zenodo.4767511) and the model is available on the website of the Laboratory of Chemoinformatics. Full article
Show Figures

Graphical abstract

Article
Tubulin Inhibitors: A Chemoinformatic Analysis Using Cell-Based Data
Molecules 2021, 26(9), 2483; https://doi.org/10.3390/molecules26092483 - 24 Apr 2021
Cited by 3 | Viewed by 1853
Abstract
Inhibiting the tubulin-microtubules (Tub-Mts) system is a classic and rational approach for treating different types of cancers. A large amount of data on inhibitors in the clinic supports Tub-Mts as a validated target. However, most of the inhibitors reported thus far have been [...] Read more.
Inhibiting the tubulin-microtubules (Tub-Mts) system is a classic and rational approach for treating different types of cancers. A large amount of data on inhibitors in the clinic supports Tub-Mts as a validated target. However, most of the inhibitors reported thus far have been developed around common chemical scaffolds covering a narrow region of the chemical space with limited innovation. This manuscript aims to discuss the first activity landscape and scaffold content analysis of an assembled and curated cell-based database of 851 Tub-Mts inhibitors with reported activity against five cancer cell lines and the Tub-Mts system. The structure–bioactivity relationships of the Tub-Mts system inhibitors were further explored using constellations plots. This recently developed methodology enables the rapid but quantitative assessment of analog series enriched with active compounds. The constellations plots identified promising analog series with high average biological activity that could be the starting points of new and more potent Tub-Mts inhibitors. Full article
Show Figures

Graphical abstract

Review

Jump to: Editorial, Research

Review
Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications
Molecules 2021, 26(17), 5291; https://doi.org/10.3390/molecules26175291 - 31 Aug 2021
Cited by 1 | Viewed by 952
Abstract
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue [...] Read more.
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis–Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments. Full article
Show Figures

Figure 1

Review
Recent Advances in In Silico Target Fishing
Molecules 2021, 26(17), 5124; https://doi.org/10.3390/molecules26175124 - 24 Aug 2021
Cited by 5 | Viewed by 1599
Abstract
In silico target fishing, whose aim is to identify possible protein targets for a query molecule, is an emerging approach used in drug discovery due its wide variety of applications. This strategy allows the clarification of mechanism of action and biological activities of [...] Read more.
In silico target fishing, whose aim is to identify possible protein targets for a query molecule, is an emerging approach used in drug discovery due its wide variety of applications. This strategy allows the clarification of mechanism of action and biological activities of compounds whose target is still unknown. Moreover, target fishing can be employed for the identification of off targets of drug candidates, thus recognizing and preventing their possible adverse effects. For these reasons, target fishing has increasingly become a key approach for polypharmacology, drug repurposing, and the identification of new drug targets. While experimental target fishing can be lengthy and difficult to implement, due to the plethora of interactions that may occur for a single small-molecule with different protein targets, an in silico approach can be quicker, less expensive, more efficient for specific protein structures, and thus easier to employ. Moreover, the possibility to use it in combination with docking and virtual screening studies, as well as the increasing number of web-based tools that have been recently developed, make target fishing a more appealing method for drug discovery. It is especially worth underlining the increasing implementation of machine learning in this field, both as a main target fishing approach and as a further development of already applied strategies. This review reports on the main in silico target fishing strategies, belonging to both ligand-based and receptor-based approaches, developed and applied in the last years, with a particular attention to the different web tools freely accessible by the scientific community for performing target fishing studies. Full article
Show Figures

Figure 1

Back to TopTop