Special Issue "Big Data Analysis in Biomolecular Research, Bioinformatics, and Systems Biology with Complex Networks and Multi-Label Machine Learning Models"

A special issue of Biomolecules (ISSN 2218-273X). This special issue belongs to the section "Bioinformatics and Systems Biology".

Deadline for manuscript submissions: closed (31 July 2020).

Special Issue Editor

Special Issue Information

Dear Colleagues,

Modern experimental techniques used in biomolecular research produce a large amount of data. These techniques include like next-generation sequencing, molecular NMR, iNMR imaging, 2DE and MS in proteomics, and EEG in neurosciences. The data produced, sometimes coined big data, have been collected in public databases online (e.g., ChEMBL, GeneBank, PDB, PubChem, KEGG, NLM, and AIDSvu). The big data sets may give important clues for knowledge discovery, translational research, and personalized medicine if we can analyze them properly. This in turn may result in the development of new applications for omics, drug discovery, vaccine design, biomarkers discovery, neurosciences, and biomedical engineering, etc.

However, most of these big data sets present certain features that difficult the analysis. We can summarize may of these problems, shortly, as big data = 5V + C data features. The 5Vs include problems with data volume, veracity, variability, velocity, and value. The C refers to the complexity of data due to in part of a high number of interconnections among variables in the complex systems studied. This is due to the existence of big data sets forming complex networks in Systems Biology. Examples of these complex networks are due to multiple drugs interacting with different target proteins (drug-target networks), protein–protein interactions networks (PINS), gene–gene regulatory networks (GRN), etc.

In this context, we may need complex network analysis tools to capture the complexity of the data and lulti-label machine learning (ML) algorithms to find predictive models for these data about systems with multiple biological properties (IC50, Ki, Km, LD50, etc.) and multiple labels (drugs, proteins, cell lines, tissues, brain regions, organisms, populations, etc.).

Last but not the least, the use of all these computational techniques to process biomolecular data becomes even more important if we develop computational biomedical engineering systems for translational and personalized medicine. In consequence, ML algorithms have to merge data from preclinical assays (as in ChEMBL databases) with data from clinical assays with personal data information. In this sense, the use of the previous data analysis tools in biomolecular sciences has to consider the legal aspects relevant to personal data protection, software copyright, etc., as well; see, e.g., GDPR in Europe, REACH, and OECD regulations.

Consequently, in this new Issue we propose to open a forum for the publication of technical aspects and new applications or results (software, databases, cheminformatic models, machine learning algorithms, and complex network tools) and the discussion of the ethical and legal implications of these tools as well.

The present Special Issue is also associated with MOL2NET-05, the International Conference on Multidisciplinary Sciences, ISSN: 2624-5078, MDPI SciForum, Basel, Switzerland, 2019. The conference has its HQs in University of Basque Country (UPV/EHU) and is supported by Professors of Ikerbasque: Basque Foundation for Sciences, Harvard Medicine School, UNC Chapel Hill, EMBL-EBI United Kingdom, CNAM Paris, Miami Dade College (MDC), University of Coruña (UDC), etc. The MOL2NET series is hosting more than 10 workshops with in-person and/or online participation every year in universities in the USA, Europe, Brazil, China, India, etc. In addition, the conference hosts the USEDAT: USA-Europe Data Analysis Training School, focused on training students worldwide in data analysis, with an emphasis in cheminformatics. The members of the committee have also guest edited other Special Issues in multiple MDPI journals. Please see the link of the conference at https://mol2net-05.sciforum.net/

We especially encourage submissions of papers from colleagues worldwide to the conference (short communications) and complete versions (full papers) to the present Special Issue. Prof. Dr. Humbert González-Díaz

Prof. Dr. Humbert González-Díaz
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biomolecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • bioinformatics
  • complex networks
  • systems biology
  • machine learning
  • cheminformatics
  • QSAR

Published Papers (13 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle
Molecular Topology for the Discovery of New Broad-Spectrum Antibacterial Drugs
Biomolecules 2020, 10(9), 1343; https://doi.org/10.3390/biom10091343 - 19 Sep 2020
Abstract
In this study, molecular topology was used to develop several discriminant equations capable of classifying compounds according to their antibacterial activity. Topological indices were used as structural descriptors and their relation to antibacterial activity was determined by applying linear discriminant analysis (LDA) on [...] Read more.
In this study, molecular topology was used to develop several discriminant equations capable of classifying compounds according to their antibacterial activity. Topological indices were used as structural descriptors and their relation to antibacterial activity was determined by applying linear discriminant analysis (LDA) on a group of quinolones and quinolone-like compounds. Four equations were constructed, named DF1, DF2, DF3, and DF4, all with good statistical parameters such as Fisher–Snedecor’s F (over 25 in all cases), Wilk’s lambda (below 0.36 in all cases) and percentage of correct classification (over 80% in all cases), which allows a reliable extrapolation prediction of antibacterial activity in any organic compound. From the four discriminant functions, it can be extracted that the presence of sp3 carbons, ramifications, and secondary amine groups in a molecule enhance antibacterial activity, whereas the presence of 5-member rings, sp2 carbons, and sp2 oxygens hinder it. The results obtained clearly reveal the high efficiency of combining molecular topology with LDA for the prediction of antibacterial activity. Full article
Show Figures

Graphical abstract

Open AccessArticle
Ranking Series of Cancer-Related Gene Expression Data by Means of the Superposing Significant Interaction Rules Method
Biomolecules 2020, 10(9), 1293; https://doi.org/10.3390/biom10091293 - 08 Sep 2020
Abstract
The Superposing Significant Interaction Rules (SSIR) method is a combinatorial procedure that deals with symbolic descriptors of samples. It is able to rank the series of samples when those items are classified into two classes. The method selects preferential descriptors and, with them, [...] Read more.
The Superposing Significant Interaction Rules (SSIR) method is a combinatorial procedure that deals with symbolic descriptors of samples. It is able to rank the series of samples when those items are classified into two classes. The method selects preferential descriptors and, with them, generates rules that make up the rank by means of a simple voting procedure. Here, two application examples are provided. In both cases, binary or multilevel strings encoding gene expressions are considered as descriptors. It is shown how the SSIR procedure is useful for ranking the series of patient transcription data to diagnose two types of cancer (leukemia and prostate cancer) obtaining Area Under Receiver Operating Characteristic (AU-ROC) values of 0.95 (leukemia prediction) and 0.80–0.90 (prostate). The preferential selected descriptors here are specific gene expressions, and this is potentially useful to point to possible key genes. Full article
Show Figures

Figure 1

Open AccessArticle
Different Research Approaches in Unraveling the Venom Proteome of Naja ashei
Biomolecules 2020, 10(9), 1282; https://doi.org/10.3390/biom10091282 - 05 Sep 2020
Abstract
The dynamic development of venomics in recent years has resulted in a significant increase in publicly available proteomic data. The information contained therein is often used for comparisons between different datasets and to draw biological conclusions therefrom. In this article, we aimed to [...] Read more.
The dynamic development of venomics in recent years has resulted in a significant increase in publicly available proteomic data. The information contained therein is often used for comparisons between different datasets and to draw biological conclusions therefrom. In this article, we aimed to show the possible differences that can arise, in the final results of the proteomic experiment, while using different research workflows. We applied two software solutions (PeptideShaker and MaxQuant) to process data from shotgun LC-MS/MS analysis of Naja ashei venom and collate it with the previous report concerning this species. We were able to provide new information regarding the protein composition of this venom but also present the qualitative and quantitative limitations of currently used proteomic methods. Moreover, we reported a rapid and straightforward technique for the separation of the fraction of proteins from the three-finger toxin family. Our results underline the necessary caution in the interpretation of data based on a comparative analysis of data derived from different studies. Full article
Show Figures

Figure 1

Open AccessArticle
The Skeleton in the Closet: Faults and Strengths of Public Versus Private Genetic Biobanks
Biomolecules 2020, 10(9), 1273; https://doi.org/10.3390/biom10091273 - 03 Sep 2020
Abstract
Direct-to-consumer (DTC) genetic testing has been a major ethical controversy related to clinical utility, the availability of pre- and post-genetic counseling, privacy concerns, and the risk of discrimination and stigmatization. The development of direct-to-consumer genetic testing cannot leave aside some considerations on how [...] Read more.
Direct-to-consumer (DTC) genetic testing has been a major ethical controversy related to clinical utility, the availability of pre- and post-genetic counseling, privacy concerns, and the risk of discrimination and stigmatization. The development of direct-to-consumer genetic testing cannot leave aside some considerations on how the samples are managed once the analyses have been completed and the customer has received a response. The possibility that these samples are maintained by the structure for future research uses, explains the definition, which has been proposed in the literature, of these structures such as private genetic biobanks. The most relevant aspects that may impact ethical aspects, allowing a comparison between the public and private dimensions of genetic biobanks, are mainly transparency and participant/donor trust. The article aims to analyze the main line of ethical debate related to the mentioned practices and to explore whether market-based and consumer rights regarding DTC genetic testing can be counterbalanced by healthcare system developments based on policies that encourage the donation of samples in the context of public biobanks. A platform for dialogue, both technical–scientific and ethical, is indispensable between the public sector, the private sector and citizens to truly maximize both transparency and public trust in both contexts. Full article
Show Figures

Graphical abstract

Open AccessArticle
Concise Polygenic Models for Cancer-Specific Identification of Drug-Sensitive Tumors from Their Multi-Omics Profiles
Biomolecules 2020, 10(6), 963; https://doi.org/10.3390/biom10060963 - 26 Jun 2020
Cited by 1
Abstract
In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). [...] Read more.
In silico models to predict which tumors will respond to a given drug are necessary for Precision Oncology. However, predictive models are only available for a handful of cases (each case being a given drug acting on tumors of a specific cancer type). A way to generate predictive models for the remaining cases is with suitable machine learning algorithms that are yet to be applied to existing in vitro pharmacogenomics datasets. Here, we apply XGBoost integrated with a stringent feature selection approach, which is an algorithm that is advantageous for these high-dimensional problems. Thus, we identified and validated 118 predictive models for 62 drugs across five cancer types by exploiting four molecular profiles (sequence mutations, copy-number alterations, gene expression, and DNA methylation). Predictive models were found in each cancer type and with every molecular profile. On average, no omics profile or cancer type obtained models with higher predictive accuracy than the rest. However, within a given cancer type, some molecular profiles were overrepresented among predictive models. For instance, CNA profiles were predictive in breast invasive carcinoma (BRCA) cell lines, but not in small cell lung cancer (SCLC) cell lines where gene expression (GEX) and DNA methylation profiles were the most predictive. Lastly, we identified the best XGBoost model per cancer type and analyzed their selected features. For each model, some of the genes in the selected list had already been found to be individually linked to the response to that drug, providing additional evidence of the usefulness of these models and the merits of the feature selection scheme. Full article
Show Figures

Figure 1

Open AccessArticle
Investigation of Volatiles in Cork Samples Using Chromatographic Data and the Superposing Significant Interaction Rules (SSIR) Chemometric Tool
Biomolecules 2020, 10(6), 896; https://doi.org/10.3390/biom10060896 - 11 Jun 2020
Cited by 1
Abstract
This study describes a new chemometric tool for the identification of relevant volatile compounds in cork by untargeted headspace solid phase microextraction and gas chromatography mass spectrometry (HS-SPME/GC-MS) analysis. The production process in cork industries commonly includes a washing procedure based on water [...] Read more.
This study describes a new chemometric tool for the identification of relevant volatile compounds in cork by untargeted headspace solid phase microextraction and gas chromatography mass spectrometry (HS-SPME/GC-MS) analysis. The production process in cork industries commonly includes a washing procedure based on water and temperature cycles in order to reduce off-flavors and decrease the amount of trichloroanisole (TCA) in cork samples. The treatment has been demonstrated to be effective for the designed purpose, but chemical changes in the volatile fraction of the cork sample are produced, which need to be further investigated through the chemometric examination of data obtained from the headspace. Ordinary principal component analysis (PCA) based on the numerical description provided by the chromatographic area of several target compounds was inconclusive. This led us to consider a new tool, which is presented here for the first time for an application in the chromatographic field. The superposing significant interaction rules (SSIR) method is a variable selector which directly analyses the raw internal data coming from the spectrophotometer software and, combined with PCA and discriminant analysis, has been able to separate a group of 56 cork samples into two groups: treated and non-treated. This procedure revealed the presence of two compounds, furfural and 5-methylfurfural, which are increased in the case of treated samples. These compounds explain the sweet notes found in the sensory evaluation of the treated corks. The model that is obtained is robust; the overall sensitivity and specificity are 96% and 100%, respectively. Furthermore, a leave-one-out cross-validation calculation revealed that all of the samples can be correctly classified one at a time if three or more PCA descriptors are considered. Full article
Show Figures

Graphical abstract

Open AccessArticle
A Comparison between Several Response Surface Methodology Designs and a Neural Network Model to Optimise the Oxidation Conditions of a Lignocellulosic Blend
Biomolecules 2020, 10(5), 787; https://doi.org/10.3390/biom10050787 - 19 May 2020
Abstract
In this paper, response surface methodology (RSM) designs and an artificial neural network (ANN) are used to obtain the optimal conditions for the oxy-combustion of a corn–rape blend. The ignition temperature (Te) and burnout index (Df) were [...] Read more.
In this paper, response surface methodology (RSM) designs and an artificial neural network (ANN) are used to obtain the optimal conditions for the oxy-combustion of a corn–rape blend. The ignition temperature (Te) and burnout index (Df) were selected as the responses to be optimised, while the CO2/O2 molar ratio, the total flow, and the proportion of rape in the blend were chosen as the influencing factors. For the RSM designs, complete, Box–Behnken, and central composite designs were performed to assess the experimental results. By applying the RSM, it was found that the principal effects of the three factors were statistically significant to compute both responses. Only the interactions of the factors on Df were successfully described by the Box–Behnken model, while the complete design model was adequate to describe such interactions on both responses. The central composite design was found to be inadequate to describe the factor interactions. Nevertheless, the three methods predicted the optimal conditions properly, due to the cancellation of net positive and negative errors in the mathematical adjustment. The ANN presented the highest regression coefficient of all methods tested and needed only 20 experiments to reach the best predictions, compared with the 32 experiments needed by the best RSM method. Hence, the ANN was found to be the most efficient model, in terms of good prediction ability and a low resource requirement. Finally, the optimum point was found to be a CO2/O2 molar ratio of 3.3, a total flow of 108 mL/min, and 61% of rape in the biomass blend. Full article
Show Figures

Figure 1

Open AccessArticle
Exploring Alzheimer’s Disease Molecular Variability via Calculation of Personalized Transcriptional Signatures
Biomolecules 2020, 10(4), 503; https://doi.org/10.3390/biom10040503 - 26 Mar 2020
Abstract
Despite huge investments and major efforts to develop remedies for Alzheimer’s disease (AD) in the past decades, AD remains incurable. While evidence for molecular and phenotypic variability in AD have been accumulating, AD research still heavily relies on the search for AD-specific genetic/protein [...] Read more.
Despite huge investments and major efforts to develop remedies for Alzheimer’s disease (AD) in the past decades, AD remains incurable. While evidence for molecular and phenotypic variability in AD have been accumulating, AD research still heavily relies on the search for AD-specific genetic/protein biomarkers that are expected to exhibit repetitive patterns throughout all patients. Thus, the classification of AD patients to different categories is expected to set the basis for the development of therapies that will be beneficial for subpopulations of patients. Here we explore the molecular heterogeneity among a large cohort of AD and non-demented brain samples, aiming to address the question whether AD-specific molecular biomarkers can progress our understanding of the disease and advance the development of anti-AD therapeutics. We studied 951 brain samples, obtained from up to 17 brain regions of 85 AD patients and 22 non-demented subjects. Utilizing an information-theoretic approach, we deciphered the brain sample-specific structures of altered transcriptional networks. Our in-depth analysis revealed that 7 subnetworks were repetitive in the 737 diseased and 214 non-demented brain samples. Each sample was characterized by a subset consisting of ~1–3 subnetworks out of 7, generating 52 distinct altered transcriptional signatures that characterized the 951 samples. We show that 30 different altered transcriptional signatures characterized solely AD samples and were not found in any of the non-demented samples. In contrast, the rest of the signatures characterized different subsets of sample types, demonstrating the high molecular variability and complexity of gene expression in AD. Importantly, different AD patients exhibiting similar expression levels of AD biomarkers harbored distinct altered transcriptional networks. Our results emphasize the need to expand the biomarker-based stratification to patient-specific transcriptional signature identification for improved AD diagnosis and for the development of subclass-specific future treatment. Full article
Show Figures

Graphical abstract

Open AccessArticle
Kernel Differential Subgraph Analysis to Reveal the Key Period Affecting Glioblastoma
Biomolecules 2020, 10(2), 318; https://doi.org/10.3390/biom10020318 - 17 Feb 2020
Abstract
Glioblastoma (GBM) is a fast-growing type of malignant primary brain tumor. To explore the mechanisms in GBM, complex biological networks are used to reveal crucial changes among different biological states, which reflect on the development of living organisms. It is critical to discover [...] Read more.
Glioblastoma (GBM) is a fast-growing type of malignant primary brain tumor. To explore the mechanisms in GBM, complex biological networks are used to reveal crucial changes among different biological states, which reflect on the development of living organisms. It is critical to discover the kernel differential subgraph (KDS) that leads to drastic changes. However, identifying the KDS is similar to the Steiner Tree problem that is an NP-hard problem. In this paper, we developed a criterion to explore the KDS (CKDS), which considered the connectivity and scale of KDS, the topological difference of nodes and function relevance between genes in the KDS. The CKDS algorithm was applied to simulated datasets and three single-cell RNA sequencing (scRNA-seq) datasets including GBM, fetal human cortical neurons (FHCN) and neural differentiation. Then we performed the network topology and functional enrichment analyses on the extracted KDSs. Compared with the state-of-art methods, the CKDS algorithm outperformed on simulated datasets to discover the KDSs. In the GBM and FHCN, seventeen genes (one biomarker, nine regulatory genes, one driver genes, six therapeutic targets) and KEGG pathways in KDSs were strongly supported by literature mining that they were highly interrelated with GBM. Moreover, focused on GBM, there were fifteen genes (including ten regulatory genes, three driver genes, one biomarkers, one therapeutic target) and KEGG pathways found in the KDS of neural differentiation process from activated neural stem cells (aNSC) to neural progenitor cells (NPC), while few genes and no pathway were found in the period from NPC to astrocytes (Ast). These experiments indicated that the process from aNSC to NPC is a key differentiation period affecting the development of GBM. Therefore, the CKDS algorithm provides a unique perspective in identifying cell-type-specific genes and KDSs. Full article
Show Figures

Figure 1

Open AccessArticle
Artificial Neural Network (ANN) as a Tool to Reduce Human-Animal Interaction Improves Senegalese Sole Production
Biomolecules 2019, 9(12), 778; https://doi.org/10.3390/biom9120778 - 25 Nov 2019
Abstract
Manipulation is usually required for biomass calculation and food estimation for optimal fish growth in production facilities. However, the advances in computer-based systems have opened a new range of applied possibilities. In this study we used image analysis and a neural network algorithm [...] Read more.
Manipulation is usually required for biomass calculation and food estimation for optimal fish growth in production facilities. However, the advances in computer-based systems have opened a new range of applied possibilities. In this study we used image analysis and a neural network algorithm that allowed us to successfully provide highly accurate biomass data. This developed system allowed us to compare the effects of reduced levels of human-animal interaction on the culture of adult Senegalese sole (Solea senegalensis) in terms of body weight gain. For this purpose, 30 adult fish were split into two homogeneous groups formed by three replicates (n = 5) each: a control group (CTRL), which was standard manipulated and an experimental group (EXP), which was maintained under a lower human-animal interaction culture using our system for biomass calculation. Visible implant elastomer was, for the first time, applied as tagging technology for tracking soles during the experiment (four months). The experimental group achieved a statistically significant weight gain (p < 0.0100) while CTRL animals did not report a statistical before-after weight increase. Individual body weight increment was lower (p < 0.0100) in standard-handled animals. In conclusion, our experimental approach provides evidence that our developed system for biomass calculation, which implies lower human-animal interaction, improves biomass gain in Senegalese sole individuals in a short period of time. Full article
Show Figures

Figure 1

Open AccessArticle
Dynamical Rearrangement of Human Epidermal Growth Factor Receptor 2 upon Antibody Binding: Effects on the Dimerization
Biomolecules 2019, 9(11), 706; https://doi.org/10.3390/biom9110706 - 05 Nov 2019
Abstract
Human epidermal growth factor 2 (HER2) is a ligand-free tyrosine kinase receptor of the HER family that is overexpressed in some of the most aggressive tumours. Although it is known that HER2 dimerization involves a specific region of its extracellular domain, the so-called [...] Read more.
Human epidermal growth factor 2 (HER2) is a ligand-free tyrosine kinase receptor of the HER family that is overexpressed in some of the most aggressive tumours. Although it is known that HER2 dimerization involves a specific region of its extracellular domain, the so-called “dimerization arm”, the mechanism of dimerization inhibition remains uncertain. However, uncovering how antibody interactions lead to inhibition of HER2 dimerization is of key importance in understanding its role in tumour progression and therapy. Herein, we employed several computational modelling techniques for a molecular-level understanding of the interactions between HER and specific anti-HER2 antibodies, namely an antigen-binding (Fab) fragment (F0178) and a single-chain variable fragment from Trastuzumab (scFv). Specifically, we investigated the effects of antibody-HER2 interactions on the key residues of “dimerization arm” from molecular dynamics (MD) simulations of unbound HER (in a total of 1 µs), as well as ScFv:HER2 and F0178:HER2 complexes (for a total of 2.5 µs). A deep surface analysis of HER receptor revealed that the binding of specific anti-HER2 antibodies induced conformational changes both in the interfacial residues, which was expected, and in the ECDII (extracellular domain), in particular at the “dimerization arm”, which is critical in establishing protein–protein interface (PPI) interactions. Our results support and advance the knowledge on the already described trastuzumab effect on blocking HER2 dimerization through synergistic inhibition and/or steric hindrance. Furthermore, our approach offers a new strategy for fine-tuning target activity through allosteric ligands. Full article
Show Figures

Graphical abstract

Open AccessArticle
A Computational Toxicology Approach to Screen the Hepatotoxic Ingredients in Traditional Chinese Medicines: Polygonum multiflorum Thunb as a Case Study
Biomolecules 2019, 9(10), 577; https://doi.org/10.3390/biom9100577 - 07 Oct 2019
Cited by 2
Abstract
In recent years, liver injury induced by Traditional Chinese Medicines (TCMs) has gained increasing attention worldwide. Assessing the hepatotoxicity of compounds in TCMs is essential and inevitable for both doctors and regulatory agencies. However, there has been no effective method to screen the [...] Read more.
In recent years, liver injury induced by Traditional Chinese Medicines (TCMs) has gained increasing attention worldwide. Assessing the hepatotoxicity of compounds in TCMs is essential and inevitable for both doctors and regulatory agencies. However, there has been no effective method to screen the hepatotoxic ingredients in TCMs available until now. In the present study, we initially built a large scale dataset of drug-induced liver injuries (DILIs). Then, 13 types of molecular fingerprints/descriptors and eight machine learning algorithms were utilized to develop single classifiers for DILI, which resulted in 5416 single classifiers. Next, the NaiveBayes algorithm was adopted to integrate the best single classifier of each machine learning algorithm, by which we attempted to build a combined classifier. The accuracy, sensitivity, specificity, and area under the curve of the combined classifier were 72.798, 0.732, 0.724, and 0.793, respectively. Compared to several prior studies, the combined classifier provided better performance both in cross validation and external validation. In our prior study, we developed a herb-hepatotoxic ingredient network and a herb-induced liver injury (HILI) dataset based on pre-clinical evidence published in the scientific literature. Herein, by combining that and the combined classifier developed in this work, we proposed the first instance of a computational toxicology to screen the hepatotoxic ingredients in TCMs. Then Polygonum multiflorum Thunb (PmT) was used as a case to investigate the reliability of the approach proposed. Consequently, a total of 25 ingredients in PmT were identified as hepatotoxicants. The results were highly consistent with records in the literature, indicating that our computational toxicology approach is reliable and effective for the screening of hepatotoxic ingredients in Pmt. The combined classifier developed in this work can be used to assess the hepatotoxic risk of both natural compounds and synthetic drugs. The computational toxicology approach presented in this work will assist with screening the hepatotoxic ingredients in TCMs, which will further lay the foundation for exploring the hepatotoxic mechanisms of TCMs. In addition, the method proposed in this work can be applied to research focused on other adverse effects of TCMs/synthetic drugs. Full article
Show Figures

Figure 1

Review

Jump to: Research

Open AccessReview
Graph Theory-Based Sequence Descriptors as Remote Homology Predictors
Biomolecules 2020, 10(1), 26; https://doi.org/10.3390/biom10010026 - 23 Dec 2019
Cited by 2
Abstract
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. [...] Read more.
Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical–numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria. Full article
Show Figures

Graphical abstract

Back to TopTop