Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (12)

Search Parameters:
Keywords = calculated fingerprints of the molecular similarity

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
36 pages, 9116 KiB  
Article
Computational Investigation of Montelukast and Its Structural Derivatives for Binding Affinity to Dopaminergic and Serotonergic Receptors: Insights from a Comprehensive Molecular Simulation
by Nasser Alotaiq and Doni Dermawan
Pharmaceuticals 2025, 18(4), 559; https://doi.org/10.3390/ph18040559 - 10 Apr 2025
Viewed by 1057
Abstract
Background/Objectives: Montelukast (MLK), a leukotriene receptor antagonist, has been associated with neuropsychiatric side effects. This study aimed to rationally modify MLK’s structure to reduce these risks by optimizing its interactions with dopamine D2 (DRD2) and serotonin 5-HT1A receptors using computational molecular simulation [...] Read more.
Background/Objectives: Montelukast (MLK), a leukotriene receptor antagonist, has been associated with neuropsychiatric side effects. This study aimed to rationally modify MLK’s structure to reduce these risks by optimizing its interactions with dopamine D2 (DRD2) and serotonin 5-HT1A receptors using computational molecular simulation techniques. Methods: A library of MLK derivatives was designed and screened using structural similarity analysis, molecular docking, molecular dynamics (MD) simulations, MM/PBSA binding free energy calculations, and ADME-Tox predictions. Structural similarity analysis, based on Tanimoto coefficient fingerprinting, compared MLK derivatives to known neuropsychiatric drugs. Docking was performed to assess initial receptor binding, followed by 100 ns MD simulations to evaluate binding stability. MM/PBSA calculations quantified binding affinities, while ADME-Tox profiling predicted pharmacokinetic and toxicity risks. Results: Several MLK derivatives showed enhanced DRD2 and 5-HT1A binding. MLK_MOD-42 and MLK_MOD-43 emerged as the most promising candidates, exhibiting MM/PBSA binding free energies of −31.92 ± 2.54 kcal/mol and −27.37 ± 2.22 kcal/mol for DRD2 and −30.22 ± 2.29 kcal/mol and −28.19 ± 2.14 kcal/mol for 5-HT1A, respectively. Structural similarity analysis confirmed that these derivatives share key pharmacophoric features with atypical antipsychotics and anxiolytics. However, off-target interactions were not assessed, which may influence their overall safety profile. ADME-Tox analysis predicted improved oral bioavailability and lower neurotoxicity risks. Conclusions: MLK_MOD-42 and MLK_MOD-43 exhibit optimized receptor interactions and enhanced pharmacokinetics, suggesting potential neuropsychiatric applications. However, their safety and efficacy remain to be validated through in vitro and in vivo studies. Until such validation is performed, these derivatives should be considered as promising candidates with optimized receptor binding rather than confirmed safer alternatives. Full article
(This article belongs to the Special Issue Application of 2D and 3D-QSAR Models in Drug Design)
Show Figures

Figure 1

18 pages, 3807 KiB  
Article
Dummy Template Molecularly Imprinted Polymers for Electrochemical Detection of Cardiac Troponin I: A Combined Computational and Experimental Approach
by Mohammad Sadegh Sadeghi Googheri, Davide Campagnol, Paolo Ugo, Samira Hozhabr Araghi and Najmeh Karimian
Chemosensors 2025, 13(1), 26; https://doi.org/10.3390/chemosensors13010026 - 20 Jan 2025
Cited by 2 | Viewed by 1643
Abstract
Cardiac troponin I (cTnI) is a crucial biomarker for the early detection of acute myocardial infarction (AMI), playing a significant role in cardiac health assessment. Molecularly imprinted polymers (MIPs) are valued for their stability, ease of fabrication, reusability, and selectivity. However, using the [...] Read more.
Cardiac troponin I (cTnI) is a crucial biomarker for the early detection of acute myocardial infarction (AMI), playing a significant role in cardiac health assessment. Molecularly imprinted polymers (MIPs) are valued for their stability, ease of fabrication, reusability, and selectivity. However, using the analyte as a template can be costly, especially if the analyte is expensive. In such cases, a dummy template (DT) with similar chemico-physical properties can be useful. This study aimed to design a DT-MIP for cTnI detection using cytochrome c (Cyt c) as the template, combining computational and experimental approaches. Molecular docking identified binding sites on Cyt c and cTnI for poly(o-phenylenediamine) (5PoPD) pentamers. Interactions and binding energies were examined using all-atom molecular dynamics (MDs) simulations and structural interaction fingerprint (SIFt) calculations. A DT-MIP-modified electrode for cTnI detection was prepared by electropolymerizing o-PD in the presence of Cyt c as a dummy template. Electrochemical techniques monitored the electropolymerization, template removal, and binding of the target analyte. The experimental results showed that the DT-MIPs exhibited a high binding affinity for cTnI, consistent with the binding energies observed in MD simulations. The satisfactory correlation between experimental and computational results validated our model-based approach for the rational design of dummy template molecularly imprinted polymers. Full article
(This article belongs to the Special Issue Recent Advances in Electrode Materials for Electrochemical Sensing)
Show Figures

Figure 1

20 pages, 6040 KiB  
Article
Harnessing the Power of Machine Learning Guided Discovery of NLRP3 Inhibitors Towards the Effective Treatment of Rheumatoid Arthritis
by Sidra Ilyas, Abdul Manan, Chanyoon Park, Hee-Geun Jo and Donghun Lee
Cells 2025, 14(1), 27; https://doi.org/10.3390/cells14010027 - 30 Dec 2024
Cited by 1 | Viewed by 1133
Abstract
The NLRP3 inflammasome, plays a critical role in the pathogenesis of rheumatoid arthritis (RA) by activating inflammatory cytokines such as IL1β and IL18. Targeting NLRP3 has emerged as a promising therapeutic strategy for RA. In this study, a multidisciplinary approach combining machine learning, [...] Read more.
The NLRP3 inflammasome, plays a critical role in the pathogenesis of rheumatoid arthritis (RA) by activating inflammatory cytokines such as IL1β and IL18. Targeting NLRP3 has emerged as a promising therapeutic strategy for RA. In this study, a multidisciplinary approach combining machine learning, quantitative structure–activity relationship (QSAR) modeling, structure–activity landscape index (SALI), docking, molecular dynamics (MD), and molecular mechanics Poisson–Boltzmann surface area MM/PBSA assays was employed to identify novel NLRP3 inhibitors. The ChEMBL database was used to retrieve compounds with known IC50 values to train machine learning (ML) models using the Lazy Predict package. After data pre-processing, 401 non-redundant structures were selected for exploratory data analysis (EDA). PubChem and MACCS fingerprints were used to predict the inhibitory activities of the compounds. SALI was used to identify structurally similar compounds with significantly different biological activities. The compounds were docked using MOE to assess their binding affinities and interactions with key residues in NLRP3. The models were evaluated, and a comparative analysis revealed that the ensemble Random Forest (RF) model (PubChem fingerprints) with RMSE (0.731), R2 (0.622), and MAPE (8.988) and bootstrap aggregating model (MACCS fingerprints) with RMSE (0.687), R2 (0.666), and MAPE (9.216) on the testing set performed well, in accordance with the Organization for Economic Cooperation and Development (OECD) guidelines. Out of all docked compounds, the two most promising compounds (ChEMBL5289544 and ChEMBL5219789) with binding scores of −7.5 and −8.2 kcal/mol were further investigated by MD to evaluate their stability and dynamic behavior within the binding site. MD simulations (200 ns) revealed strong structural stability, flexibility, and interactions in the selected complexes. MM/PBSA binding free energy calculations revealed that van der Waals and electrostatic forces were the key drivers of the binding of the protein with ligands. The outcomes obtained can be used to design more potent and selective NLRP3 inhibitors as therapeutic agents for the treatment of inflammatory diseases such as RA. However, concerns related to the lack of large datasets, experimental validation, and high computational costs remain. Full article
(This article belongs to the Special Issue Novel Therapeutic Targets of Rheumatoid Arthritis)
Show Figures

Graphical abstract

10 pages, 3528 KiB  
Article
A Terahertz Metasurface Sensor Based on Quasi-BIC for Detection of Additives in Infant Formula
by Mingjun Sun, Jie Lin, Ying Xue, Weijin Wang, Shengnan Shi, Shan Zhang and Yanpeng Shi
Nanomaterials 2024, 14(10), 883; https://doi.org/10.3390/nano14100883 - 19 May 2024
Cited by 5 | Viewed by 2046
Abstract
Prohibited additives in infant formula severely affect the health of infants. Terahertz (THz) spectroscopy has enormous application potential in analyte detection due to its rich fingerprint information content. However, there is limited research on the mixtures of multiple analytes. In this study, we [...] Read more.
Prohibited additives in infant formula severely affect the health of infants. Terahertz (THz) spectroscopy has enormous application potential in analyte detection due to its rich fingerprint information content. However, there is limited research on the mixtures of multiple analytes. In this study, we propose a split ring metasurface that supports magnetic dipole bound states in the continuum (BIC). By breaking the symmetry, quasi-BIC with a high quality (Q) factor can be generated. Utilizing an angle-scanning strategy, the frequency of the resonance dip can be shifted, resulting in the plotting of an envelope curve which can reflect the molecular fingerprint of the analytes. Two prohibited additives found in infant formula, melamine and vanillin, can be identified in different proportions. Furthermore, a metric similar to the resolution in chromatographic analysis is introduced and calculated to be 0.61, indicating that these two additives can be detected simultaneously. Our research provides a new solution for detecting additives in infant formula. Full article
(This article belongs to the Special Issue Nanomaterials for Terahertz Technology Applications)
Show Figures

Figure 1

18 pages, 8717 KiB  
Article
Comprehensive Similarity Algorithm and Molecular Dynamics Simulation-Assisted Terahertz Spectroscopy for Intelligent Matching Identification of Quorum Signal Molecules (N-Acyl-Homoserine Lactones)
by Lintong Zhang, Xiangzeng Kong, Fangfang Qu, Linjie Chen, Jinglin Li, Yilun Jiang, Chuxin Wang, Wenqing Zhang, Qiuhua Yang and Dapeng Ye
Int. J. Mol. Sci. 2024, 25(3), 1901; https://doi.org/10.3390/ijms25031901 - 5 Feb 2024
Viewed by 2111
Abstract
To investigate the mechanism of aquatic pathogens in quorum sensing (QS) and decode the signal transmission of aquatic Gram-negative pathogens, this paper proposes a novel method for the intelligent matching identification of eight quorum signaling molecules (N-acyl-homoserine lactones, AHLs) with similar molecular structures, [...] Read more.
To investigate the mechanism of aquatic pathogens in quorum sensing (QS) and decode the signal transmission of aquatic Gram-negative pathogens, this paper proposes a novel method for the intelligent matching identification of eight quorum signaling molecules (N-acyl-homoserine lactones, AHLs) with similar molecular structures, using terahertz (THz) spectroscopy combined with molecular dynamics simulation and spectral similarity calculation. The THz fingerprint absorption spectral peaks of the eight AHLs were identified, attributed, and resolved using the density functional theory (DFT) for molecular dynamics simulation. To reduce the computational complexity of matching recognition, spectra with high peak matching values with the target were preliminarily selected, based on the peak position features of AHL samples. A comprehensive similarity calculation (CSC) method using a weighted improved Jaccard similarity algorithm (IJS) and discrete Fréchet distance algorithm (DFD) is proposed to calculate the similarity between the selected spectra and the targets, as well as to return the matching result with the highest accuracy. The results show that all AHL molecular types can be correctly identified, and the average quantization accuracy of CSC is 98.48%. This study provides a theoretical and data-supported foundation for the identification of AHLs, based on THz spectroscopy, and offers a new method for the high-throughput and automatic identification of AHLs. Full article
(This article belongs to the Section Molecular Informatics)
Show Figures

Figure 1

20 pages, 4426 KiB  
Article
The Chemical Space of Marine Antibacterials: Diphenyl Ethers, Benzophenones, Xanthones, and Anthraquinones
by José X. Soares, Inês Afonso, Adaleta Omerbasic, Daniela R. P. Loureiro, Madalena M. M. Pinto and Carlos M. M. Afonso
Molecules 2023, 28(10), 4073; https://doi.org/10.3390/molecules28104073 - 13 May 2023
Cited by 5 | Viewed by 2740
Abstract
The emergence of multiresistant bacteria and the shortage of antibacterials in the drug pipeline creates the need to search for novel agents. Evolution drives the optimization of the structure of marine natural products to act as antibacterial agents. Polyketides are a vast and [...] Read more.
The emergence of multiresistant bacteria and the shortage of antibacterials in the drug pipeline creates the need to search for novel agents. Evolution drives the optimization of the structure of marine natural products to act as antibacterial agents. Polyketides are a vast and structurally diverse family of compounds that have been isolated from different marine microorganisms. Within the different polyketides, benzophenones, diphenyl ethers, anthraquinones, and xanthones have shown promising antibacterial activity. In this work, a dataset of 246 marine polyketides has been identified. In order to characterize the chemical space occupied by these marine polyketides, molecular descriptors and fingerprints were calculated. Molecular descriptors were analyzed according to the scaffold, and principal component analysis was performed to identify the relationships among the different descriptors. Generally, the identified marine polyketides are unsaturated, water-insoluble compounds. Among the different polyketides, diphenyl ethers tend to be more lipophilic and non-polar than the remaining classes. Molecular fingerprints were used to group the polyketides according to their molecular similarity into clusters. A total of 76 clusters were obtained, with a loose threshold for the Butina clustering algorithm, highlighting the large structural diversity of the marine polyketides. The large structural diversity was also evidenced by the visualization trees map assembled using the tree map (TMAP) unsupervised machine-learning method. The available antibacterial activity data were examined in terms of bacterial strains, and the activity data were used to rank the compounds according to their antibacterial potential. This potential ranking was used to identify the most promising compounds (four compounds) which can inspire the development of new structural analogs with better potency and absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties. Full article
(This article belongs to the Section Medicinal Chemistry)
Show Figures

Graphical abstract

23 pages, 11458 KiB  
Article
Comparative Assessment of Docking Programs for Docking and Virtual Screening of Ribosomal Oxazolidinone Antibacterial Agents
by McKenna E. Buckley, Audrey R. N. Ndukwe, Pramod C. Nair, Santu Rana, Kathryn E. Fairfull-Smith and Neha S. Gandhi
Antibiotics 2023, 12(3), 463; https://doi.org/10.3390/antibiotics12030463 - 24 Feb 2023
Cited by 7 | Viewed by 4935
Abstract
Oxazolidinones are a broad-spectrum class of synthetic antibiotics that bind to the 50S ribosomal subunit of Gram-positive and Gram-negative bacteria. Many crystal structures of the ribosomes with oxazolidinone ligands have been reported in the literature, facilitating structure-based design using methods such as molecular [...] Read more.
Oxazolidinones are a broad-spectrum class of synthetic antibiotics that bind to the 50S ribosomal subunit of Gram-positive and Gram-negative bacteria. Many crystal structures of the ribosomes with oxazolidinone ligands have been reported in the literature, facilitating structure-based design using methods such as molecular docking. It would be of great interest to know in advance how well docking methods can reproduce the correct ligand binding modes and rank these correctly. We examined the performance of five molecular docking programs (AutoDock 4, AutoDock Vina, DOCK 6, rDock, and RLDock) for their ability to model ribosomal–ligand interactions with oxazolidinones. Eleven ribosomal crystal structures with oxazolidinones as the ligands were docked. The accuracy was evaluated by calculating the docked complexes’ root-mean-square deviation (RMSD) and the program’s internal scoring function. The rankings for each program based on the median RMSD between the native and predicted were DOCK 6 > AD4 > Vina > RDOCK >> RLDOCK. Results demonstrate that the top-performing program, DOCK 6, could accurately replicate the ligand binding in only four of the eleven ribosomes due to the poor electron density of said ribosomal structures. In this study, we have further benchmarked the performance of the DOCK 6 docking algorithm and scoring in improving virtual screening (VS) enrichment using the dataset of 285 oxazolidinone derivatives against oxazolidinone binding sites in the S. aureus ribosome. However, there was no clear trend between the structure and activity of the oxazolidinones in VS. Overall, the docking performance indicates that the RNA pocket’s high flexibility does not allow for accurate docking prediction, highlighting the need to validate VS. protocols for ligand-RNA before future use. Later, we developed a re-scoring method incorporating absolute docking scores and molecular descriptors, and the results indicate that the descriptors greatly improve the correlation of docking scores and pMIC values. Morgan fingerprint analysis was also used, suggesting that DOCK 6 underpredicted molecules with tail modifications with acetamide, n-methylacetamide, or n-ethylacetamide and over-predicted molecule derivatives with methylamino bits. Alternatively, a ligand-based approach similar to a field template was taken, indicating that each derivative’s tail groups have strong positive and negative electrostatic potential contributing to microbial activity. These results indicate that one should perform VS. campaigns of ribosomal antibiotics with care and that more comprehensive strategies, including molecular dynamics simulations and relative free energy calculations, might be necessary in conjunction with VS. and docking. Full article
(This article belongs to the Special Issue Ribosomal Antibiotics: Recent Advances)
Show Figures

Figure 1

15 pages, 2179 KiB  
Article
Machine Learning-Based Retention Time Prediction of Trimethylsilyl Derivatives of Metabolites
by Sara M. de Cripan, Adrià Cereto-Massagué, Pol Herrero, Andrei Barcaru, Núria Canela and Xavier Domingo-Almenara
Biomedicines 2022, 10(4), 879; https://doi.org/10.3390/biomedicines10040879 - 11 Apr 2022
Cited by 15 | Viewed by 3587
Abstract
In gas chromatography–mass spectrometry-based untargeted metabolomics, metabolites are identified by comparing mass spectra and chromatographic retention time with reference databases or standard materials. In that sense, machine learning has been used to predict the retention time of metabolites lacking reference data. However, the [...] Read more.
In gas chromatography–mass spectrometry-based untargeted metabolomics, metabolites are identified by comparing mass spectra and chromatographic retention time with reference databases or standard materials. In that sense, machine learning has been used to predict the retention time of metabolites lacking reference data. However, the retention time prediction of trimethylsilyl derivatives of metabolites, typically analyzed in untargeted metabolomics using gas chromatography, has been poorly explored. Here, we provide a rationalized framework for machine learning-based retention time prediction of trimethylsilyl derivatives of metabolites in gas chromatography. We compared different machine learning paradigms, in addition to exploring the influence of the computational molecular structure representation to train the prediction models: fingerprint class and fingerprint calculation software. Our study challenged predicted retention time when using chemical ionization and electron impact ionization sources in simulated and real cases, demonstrating a good correct identity ranking capability by machine learning, despite observing a limited false identity filtering power in cases where a spectrum or a monoisotopic mass match to multiple candidates. Specifically, machine learning prediction yielded median absolute and relative retention index (relative retention time) errors of 37.1 retention index units and 2%, respectively. In addition, fingerprint class and fingerprint calculation software, as well as the molecular structural similarity between the training and test or real case sets, showed to be critical modulators of the prediction performance. Finally, we leveraged the structural similarity between the training and test or real case set to determine the probability that the prediction error is below a specific threshold. Overall, our study demonstrates that predicted retention time can provide insights into the true structure of unknown metabolites by ranking from the most to the least plausible molecular identity, and sets the guidelines to assess the confidence in metabolite identification using predicted retention time data. Full article
(This article belongs to the Special Issue Omics Data Analysis and Integration in Complex Diseases)
Show Figures

Figure 1

17 pages, 5438 KiB  
Article
Identification of Corosolic and Oleanolic Acids as Molecules Antagonizing the Human RORγT Nuclear Receptor Using the Calculated Fingerprints of the Molecular Similarity
by Joanna Pastwińska, Kaja Karaś, Anna Sałkowska, Iwona Karwaciak, Katarzyna Chałaśkiewicz, Błażej A. Wojtczak, Rafał A. Bachorz and Marcin Ratajewski
Int. J. Mol. Sci. 2022, 23(3), 1906; https://doi.org/10.3390/ijms23031906 - 8 Feb 2022
Cited by 8 | Viewed by 3498
Abstract
RORγT is a protein product of the RORC gene belonging to the nuclear receptor subfamily of retinoic-acid-receptor-related orphan receptors (RORs). RORγT is preferentially expressed in Th17 lymphocytes and drives their differentiation from naive CD4+ cells and is involved in the regulation of the [...] Read more.
RORγT is a protein product of the RORC gene belonging to the nuclear receptor subfamily of retinoic-acid-receptor-related orphan receptors (RORs). RORγT is preferentially expressed in Th17 lymphocytes and drives their differentiation from naive CD4+ cells and is involved in the regulation of the expression of numerous Th17-specific cytokines, such as IL-17. Because Th17 cells are implicated in the pathology of autoimmune diseases (e.g., psoriasis, inflammatory bowel disease, multiple sclerosis), RORγT, whose activity is regulated by ligands, has been recognized as a drug target in potential therapies against these diseases. The identification of such ligands is time-consuming and usually requires the screening of chemical libraries. Herein, using a Tanimoto similarity search, we found corosolic acid and other pentacyclic tritepenes in the library we previously screened as compounds highly similar to the RORγT inverse agonist ursolic acid. Furthermore, using gene reporter assays and Th17 lymphocytes, we distinguished compounds that exert stronger biological effects (ursolic, corosolic, and oleanolic acid) from those that are ineffective (asiatic and maslinic acids), providing evidence that such combinatorial methodology (in silico and experimental) might help wet screenings to achieve more accurate results, eliminating false negatives. Full article
(This article belongs to the Special Issue Drug Design and Virtual Screening)
Show Figures

Figure 1

7 pages, 968 KiB  
Article
The Chemical Property Position of Bedaquiline Construed by a Chemical Global Positioning System-Natural Product
by Muaaz Mutaz Alajlani
Molecules 2022, 27(3), 753; https://doi.org/10.3390/molecules27030753 - 24 Jan 2022
Cited by 5 | Viewed by 3649
Abstract
Bedaquiline is a novel adenosine triphosphate synthase inhibitor anti-tuberculosis drug. Bedaquiline belongs to the class of diarylquinolines, which are antituberculosis drugs that are quite different mechanistically from quinolines and flouroquinolines. The fact that relatively similar chemical drugs produce different mechanisms of action is [...] Read more.
Bedaquiline is a novel adenosine triphosphate synthase inhibitor anti-tuberculosis drug. Bedaquiline belongs to the class of diarylquinolines, which are antituberculosis drugs that are quite different mechanistically from quinolines and flouroquinolines. The fact that relatively similar chemical drugs produce different mechanisms of action is still not widely understood. To enhance discrimination in favor of bedaquiline, a new approach using eight-score principal component analysis (PCA), provided by a ChemGPS-NP model, is proposed. PCA scores were calculated based on 35 + 1 different physicochemical properties and demonstrated clear differences when compared with other quinolines. The ChemGPS-NP model provided an exceptional 100 compounds nearest to bedaquiline from antituberculosis screening sets (with a cumulative Euclidian distance of 196.83), compared with the different 2Dsimilarity provided by Tanimoto methods (extended connective fingerprints and the Molecular ACCess System, showing 30% and 182% increases in cumulative Euclidian distance, respectively). Potentially similar compounds from publicly available antituberculosis compounds and Maybridge sets, based on bedaquiline’s eight-dimensional similarity and different filtrations, were identified too. Full article
Show Figures

Figure 1

17 pages, 2617 KiB  
Article
VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder
by Soumitra Samanta, Steve O’Hagan, Neil Swainston, Timothy J. Roberts and Douglas B. Kell
Molecules 2020, 25(15), 3446; https://doi.org/10.3390/molecules25153446 - 29 Jul 2020
Cited by 32 | Viewed by 8691
Abstract
Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or [...] Read more.
Molecular similarity is an elusive but core “unsupervised” cheminformatics concept, yet different “fingerprint” encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are “better” than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a “bowtie”-shaped artificial neural network. In the middle is a “bottleneck layer” or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics. Full article
(This article belongs to the Section Chemical Biology)
Show Figures

Figure 1

9 pages, 303 KiB  
Communication
A Structural Hierarchy Matching Approach for Molecular Similarity/Substructure Searching
by Shu-Shen Ji, Hong-Ju Dong, Xin-Xin Zhou, Ya-Min Liu, Feng-Xue Zhang, Qi Wang and Xin-An Huang
Molecules 2015, 20(5), 8791-8799; https://doi.org/10.3390/molecules20058791 - 15 May 2015
Viewed by 5429
Abstract
An approach for molecular similarity/substructure searching based on structural hierarchy matching is proposed. In this approach, small molecules are divided into two categories, acyclic and cyclic forms. The latter are further divided into three structural hierarchies, namely, framework, complicated-, and mono-rings. During searching, [...] Read more.
An approach for molecular similarity/substructure searching based on structural hierarchy matching is proposed. In this approach, small molecules are divided into two categories, acyclic and cyclic forms. The latter are further divided into three structural hierarchies, namely, framework, complicated-, and mono-rings. During searching, the similarity coefficients of a structural query and each retrieved molecule are calculated using the hierarchy of the query as the reference. A total of 13,911 chemicals were involved in this work, from which the minimal cyclic and acyclic substructures are extracted, and further processed into fuzzy structural fingerprints. Subsequently, the fingerprints are used as the searching indices for molecular similarity or substructure searching. The tests show that this approach can give user options to choose between one-substructure and multi-substructure searching with sorted results. Moreover, this algorithm has the potential to be developed for molecular similarity searching and substructure analysis. Full article
(This article belongs to the Section Molecular Diversity)
Show Figures

Graphical abstract

Back to TopTop