Special Issue "Selected Papers from the Bioinformatics and Intelligent Information Processing Conference (BIIP2018)"

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (20 July 2018)

Special Issue Editor

Guest Editor
Prof. Dr. Quan Zou

School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
Website | E-Mail
Interests: bioinformatics; molecular computing; sequence alignment; systems biology

Special Issue Information

Dear Colleagues,

The Bioinformatics and Intelligent Information Processing Conference (BIIP2018), organized by the China Association of Artificial Intelligence, will be held in Tianjin, China, June 15–17, 2018. The conference is supported and sponsored by Tianjin University.

Bioinformatics have become an intensive research topic in the recent past decade, and have attracted a great many leading scientists working in Biology, Physics, Mathematics and Computer Science. Optimization, statistics, algorithms, and many other informatics methods have been widely used in the field.

Following the successful BIIP conferences series, the purpose of BIIP 2018 is to extend the international forum for scientists, researchers, educators, and practitioners to exchange ideas and approaches, to present research findings and state-of-the-art solutions in this interdisciplinary field, including theoretical methodology development and its applications in biosciences and researches on various aspects of bioinformatics. Excellent speakers in China will present their results. 

Prof. Dr. Quan Zou
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Bioinformatics
  • Machine learning
  • Systems biology
  • Biological networks
  • Computational biology

Published Papers (18 papers)

View options order results:
result details:
Displaying articles 1-18
Export citation of selected articles as:

Research

Open AccessArticle FDHE-IW: A Fast Approach for Detecting High-Order Epistasis in Genome-Wide Case-Control Studies
Received: 11 June 2018 / Revised: 16 August 2018 / Accepted: 16 August 2018 / Published: 29 August 2018
PDF Full-text (1355 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Detecting high-order epistasis in genome-wide association studies (GWASs) is of importance when characterizing complex human diseases. However, the enormous numbers of possible single-nucleotide polymorphism (SNP) combinations and the diversity among diseases presents a significant computational challenge. Herein, a fast method for detecting high-order
[...] Read more.
Detecting high-order epistasis in genome-wide association studies (GWASs) is of importance when characterizing complex human diseases. However, the enormous numbers of possible single-nucleotide polymorphism (SNP) combinations and the diversity among diseases presents a significant computational challenge. Herein, a fast method for detecting high-order epistasis based on an interaction weight (FDHE-IW) method is evaluated in the detection of SNP combinations associated with disease. First, the symmetrical uncertainty (SU) value for each SNP is calculated. Then, the top-k SNPs are isolated as guiders to identify 2-way SNP combinations with significant interaction weight values. Next, a forward search is employed to detect high-order SNP combinations with significant interaction weight values as candidates. Finally, the findings were statistically evaluated using a G-test to isolate true positives. The developed algorithm was used to evaluate 12 simulated datasets and an age-related macular degeneration (AMD) dataset and was shown to perform robustly in the detection of some high-order disease-causing models. Full article
Figures

Figure 1

Open AccessArticle An Integrated Approach for Identifying Molecular Subtypes in Human Colon Cancer Using Gene Expression Data
Received: 22 May 2018 / Revised: 18 July 2018 / Accepted: 27 July 2018 / Published: 2 August 2018
PDF Full-text (1738 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Identifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view, a progressive
[...] Read more.
Identifying molecular subtypes of colorectal cancer (CRC) may allow for more rational, patient-specific treatment. Various studies have identified molecular subtypes for CRC using gene expression data, but they are inconsistent and further research is necessary. From a methodological point of view, a progressive approach is needed to identify molecular subtypes in human colon cancer using gene expression data. We propose an approach to identify the molecular subtypes of colon cancer that integrates denoising by the Bayesian robust principal component analysis (BRPCA) algorithm, hierarchical clustering by the directed bubble hierarchical tree (DBHT) algorithm, and feature gene selection by an improved differential evolution based feature selection method (DEFSW) algorithm. In this approach, the normal samples being completely and exclusively clustered into one class is considered to be the standard of reasonable clustering subtypes, and the feature selection pays attention to imbalances of samples among subtypes. With this approach, we identified the molecular subtypes of colon cancer on the mRNA gene expression dataset of 153 colon cancer samples and 19 normal control samples of the Cancer Genome Atlas (TCGA) project. The colon cancer was clustered into 7 subtypes with 44 feature genes. Our approach could identify finer subtypes of colon cancer with fewer feature genes than the other two recent studies and exhibits a generic methodology that might be applied to identify the subtypes of other cancers. Full article
Figures

Figure 1

Open AccessArticle A Model Stacking Framework for Identifying DNA Binding Proteins by Orchestrating Multi-View Features and Classifiers
Received: 3 June 2018 / Revised: 24 July 2018 / Accepted: 24 July 2018 / Published: 1 August 2018
PDF Full-text (2858 KB) | HTML Full-text | XML Full-text
Abstract
Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences
[...] Read more.
Nowadays, various machine learning-based approaches using sequence information alone have been proposed for identifying DNA-binding proteins, which are crucial to many cellular processes, such as DNA replication, DNA repair and DNA modification. Among these methods, building a meaningful feature representation of the sequences and choosing an appropriate classifier are the most trivial tasks. Disclosing the significances and contributions of different feature spaces and classifiers to the final prediction is of the utmost importance, not only for the prediction performances, but also the practical clues of biological experiment designs. In this study, we propose a model stacking framework by orchestrating multi-view features and classifiers (MSFBinder) to investigate how to integrate and evaluate loosely-coupled models for predicting DNA-binding proteins. The framework integrates multi-view features including Local_DPP, 188D, Position-Specific Scoring Matrix (PSSM)_DWT and autocross-covariance of secondary structures(AC_Struc), which were extracted based on evolutionary information, sequence composition, physiochemical properties and predicted structural information, respectively. These features are fed into various loosely-coupled classifiers such as SVM and random forest. Then, a logistic regression model was applied to evaluate the contributions of these individual classifiers and to make the final prediction. When performing on the training dataset PDB1075, the proposed method achieves an accuracy of 83.53%. On the independent dataset PDB186, the method achieves an accuracy of 81.72%, which outperforms many existing methods. These results suggest that the framework is able to orchestrate various predicted models flexibly with good performances. Full article
Figures

Figure 1

Open AccessArticle Multimodal 3D DenseNet for IDH Genotype Prediction in Gliomas
Received: 26 May 2018 / Revised: 6 July 2018 / Accepted: 16 July 2018 / Published: 30 July 2018
PDF Full-text (2224 KB) | HTML Full-text | XML Full-text
Abstract
Non-invasive prediction of isocitrate dehydrogenase (IDH) genotype plays an important role in tumor glioma diagnosis and prognosis. Recently, research has shown that radiology images can be a potential tool for genotype prediction, and fusion of multi-modality data by deep learning methods
[...] Read more.
Non-invasive prediction of isocitrate dehydrogenase (IDH) genotype plays an important role in tumor glioma diagnosis and prognosis. Recently, research has shown that radiology images can be a potential tool for genotype prediction, and fusion of multi-modality data by deep learning methods can further provide complementary information to enhance prediction accuracy. However, it still does not have an effective deep learning architecture to predict IDH genotype with three-dimensional (3D) multimodal medical images. In this paper, we proposed a novel multimodal 3D DenseNet (M3D-DenseNet) model to predict IDH genotypes with multimodal magnetic resonance imaging (MRI) data. To evaluate its performance, we conducted experiments on the BRATS-2017 and The Cancer Genome Atlas breast invasive carcinoma (TCGA-BRCA) dataset to get image data as input and gene mutation information as the target, respectively. We achieved 84.6% accuracy (area under the curve (AUC) = 85.7%) on the validation dataset. To evaluate its generalizability, we applied transfer learning techniques to predict World Health Organization (WHO) grade status, which also achieved a high accuracy of 91.4% (AUC = 94.8%) on validation dataset. With the properties of automatic feature extraction, and effective and high generalizability, M3D-DenseNet can serve as a useful method for other multimodal radiogenomics problems and has the potential to be applied in clinical decision making. Full article
Figures

Figure 1

Open AccessArticle A Herpes Simplex Virus Thymidine Kinase-Induced Mouse Model of Hepatocellular Carcinoma Associated with Up-Regulated Immune-Inflammatory-Related Signals
Received: 7 June 2018 / Revised: 19 July 2018 / Accepted: 23 July 2018 / Published: 27 July 2018
PDF Full-text (2979 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Inflammation and fibrosis in human liver are often precursors to hepatocellular carcinoma (HCC), yet none of them is easily modeled in animals. We previously generated transgenic mice with hepatocyte-specific expressed herpes simplex virus thymidine kinase (HSV-tk). These mice would develop hepatitis
[...] Read more.
Inflammation and fibrosis in human liver are often precursors to hepatocellular carcinoma (HCC), yet none of them is easily modeled in animals. We previously generated transgenic mice with hepatocyte-specific expressed herpes simplex virus thymidine kinase (HSV-tk). These mice would develop hepatitis with the administration of ganciclovir (GCV). However, our HSV-tk transgenic mice developed hepatitis and HCC tumor as early as six months of age even without GCV administration. We analyzed the transcriptome of the HSV-tk HCC tumor and hepatitis tissue using microarray analysis to investigate the possible causes of HCC. Gene Ontology (GO) enrichment analysis showed that the up-regulated genes in the HCC tissue mainly include the immune-inflammatory and cell cycle genes. The down-regulated genes in HCC tumors are mainly concentrated in the regions related to lipid metabolism. Gene set enrichment analysis (GSEA) showed that immune-inflammatory-related signals in the HSV-tk mice are up-regulated compared to those in Notch mice. Our study suggests that the immune system and inflammation play an important role in HCC development in HSV-tk mice. Specifically, increased expression of immune-inflammatory-related genes is characteristic of HSV-tk mice and that inflammation-induced cell cycle activation maybe a precursory step to cancer. The HSV-tk mouse provides a suitable model for the study of the relationship between immune-inflammation and HCC, and their underlying mechanism for the development of therapeutic application in the future. Full article
Figures

Figure 1

Open AccessArticle Genome-Scale Metabolic Model of Actinosynnema pretiosum ATCC 31280 and Its Application for Ansamitocin P-3 Production Improvement
Received: 22 May 2018 / Revised: 6 July 2018 / Accepted: 9 July 2018 / Published: 20 July 2018
PDF Full-text (2939 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Actinosynnema pretiosum ATCC 31280 is the producer of antitumor agent ansamitocin P-3 (AP-3). Understanding of the AP-3 biosynthetic pathway and the whole metabolic network in A. pretiosum is important for the improvement of AP-3 titer. In this study, we reconstructed the first complete
[...] Read more.
Actinosynnema pretiosum ATCC 31280 is the producer of antitumor agent ansamitocin P-3 (AP-3). Understanding of the AP-3 biosynthetic pathway and the whole metabolic network in A. pretiosum is important for the improvement of AP-3 titer. In this study, we reconstructed the first complete Genome-Scale Metabolic Model (GSMM) Aspm1282 for A. pretiosum ATCC 31280 based on the newly sequenced genome, with 87% reactions having definite functional annotation. The model has been validated by effectively predicting growth and the key genes for AP-3 biosynthesis. Then we built condition-specific models for an AP-3 high-yield mutant NXJ-24 by integrating Aspm1282 model with time-course transcriptome data. The changes of flux distribution reflect the metabolic shift from growth-related pathway to secondary metabolism pathway since the second day of cultivation. The AP-3 and methionine metabolisms were both enriched in active flux for the last two days, which uncovered the relationships among cell growth, activation of methionine metabolism, and the biosynthesis of AP-3. Furthermore, we identified four combinatorial gene modifications for overproducing AP-3 by in silico strain design, which improved the theoretical flux of AP-3 biosynthesis from 0.201 to 0.372 mmol/gDW/h. Upregulation of methionine metabolic pathway is a potential strategy to improve the production of AP-3. Full article
Figures

Figure 1

Open AccessArticle Application of Whole Exome and Targeted Panel Sequencing in the Clinical Molecular Diagnosis of 319 Chinese Families with Inherited Retinal Dystrophy and Comparison Study
Received: 3 May 2018 / Revised: 11 July 2018 / Accepted: 12 July 2018 / Published: 19 July 2018
PDF Full-text (255 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Inherited retinal dystrophies (IRDs) are a group of clinically and genetically heterogeneous diseases involving more than 280 genes and no less than 20 different clinical phenotypes. In this study, our aims were to identify the disease-causing gene variants of 319 Chinese patients with
[...] Read more.
Inherited retinal dystrophies (IRDs) are a group of clinically and genetically heterogeneous diseases involving more than 280 genes and no less than 20 different clinical phenotypes. In this study, our aims were to identify the disease-causing gene variants of 319 Chinese patients with IRD, and compare the pros and cons of targeted panel sequencing and whole exome sequencing (WES). Patients were assigned for analysis with a hereditary eye disease enrichment panel (HEDEP) or WES examination based on time of recruitment. This HEDEP was able to capture 441 hereditary eye disease genes, which included 291 genes related to IRD. As RPGR ORF15 was difficult to capture, all samples were subjected to Sanger sequencing for this region. Among the 163 disease-causing variants identified in this study, 73 had been previously reported, and the other 90 were novel. Genes most commonly implicated in different inheritances of IRDs in this cohort were presented. HEDEP and WES achieved diagnostic yield with 41.2% and 33.0%, respectively. In addition, nine patients were found to carry pathogenic mutations in the RPGR ORF15 region with Sanger sequencing. Our study demonstrates that HEDEP can be used as a first-tier test for patients with IRDs. Full article
Figures

Graphical abstract

Open AccessArticle Ensemble Consensus-Guided Unsupervised Feature Selection to Identify Huntington’s Disease-Associated Genes
Received: 30 May 2018 / Revised: 6 July 2018 / Accepted: 9 July 2018 / Published: 12 July 2018
PDF Full-text (1342 KB) | HTML Full-text | XML Full-text
Abstract
Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random
[...] Read more.
Due to the complexity of the pathological mechanisms of neurodegenerative diseases, traditional differentially-expressed gene selection methods cannot detect disease-associated genes accurately. Recent studies have shown that consensus-guided unsupervised feature selection (CGUFS) performs well in feature selection for identifying disease-associated genes. Since the random initialization of the feature selection matrix in CGUFS results in instability of the final disease-associated gene set, for the purposes of this study we proposed an ensemble method based on CGUFS—namely, ensemble consensus-guided unsupervised feature selection (ECGUFS) in order to further improve the accuracy of disease-associated genes and the stability of feature gene sets. We also proposed a bagging integration strategy to integrate the results of CGUFS. Lastly, we conducted experiments with Huntington’s disease RNA sequencing (RNA-Seq) data and obtained the final feature gene set, where we detected 287 disease-associated genes. Enrichment analysis on these genes has shown that postsynaptic density and the postsynaptic membrane, synapse, and cell junction are all affected during the disease’s progression. However, ECGUFS greatly improved the accuracy of disease-associated gene prediction and the stability of the disease-associated gene set. We conducted a classification of samples with labels based on the linear support vector machine with 10-fold cross-validation. The average accuracy is 0.9, which suggests the effectiveness of the feature gene set. Full article
Figures

Figure 1

Open AccessArticle A Novel Probability Model for LncRNA–Disease Association Prediction Based on the Naïve Bayesian Classifier
Received: 26 May 2018 / Revised: 24 June 2018 / Accepted: 3 July 2018 / Published: 8 July 2018
PDF Full-text (338 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
An increasing number of studies have indicated that long-non-coding RNAs (lncRNAs) play crucial roles in biological processes, complex disease diagnoses, prognoses, and treatments. However, experimentally validated associations between lncRNAs and diseases are still very limited. Recently, computational models have been developed to discover
[...] Read more.
An increasing number of studies have indicated that long-non-coding RNAs (lncRNAs) play crucial roles in biological processes, complex disease diagnoses, prognoses, and treatments. However, experimentally validated associations between lncRNAs and diseases are still very limited. Recently, computational models have been developed to discover potential associations between lncRNAs and diseases by integrating multiple heterogeneous biological data; this has become a hot topic in biological research. In this article, we constructed a global tripartite network by integrating a variety of biological information including miRNA–disease, miRNA–lncRNA, and lncRNA–disease associations and interactions. Then, we constructed a global quadruple network by appending gene–lncRNA interaction, gene–disease association, and gene–miRNA interaction networks to the global tripartite network. Subsequently, based on these two global networks, a novel approach was proposed based on the naïve Bayesian classifier to predict potential lncRNA–disease associations (NBCLDA). Comparing with the state-of-the-art methods, our new method does not entirely rely on known lncRNA–disease associations, and can achieve a reliable performance with effective area under ROC curve (AUCs)in leave-one-out cross validation. Moreover, in order to further estimate the performance of NBCLDA, case studies of colorectal cancer, prostate cancer, and glioma were implemented in this paper, and the simulation results demonstrated that NBCLDA can be an excellent tool for biomedical research in the future. Full article
Figures

Figure 1

Open AccessArticle Gene Regulatory Networks Reconstruction Using the Flooding-Pruning Hill-Climbing Algorithm
Received: 23 May 2018 / Revised: 28 June 2018 / Accepted: 2 July 2018 / Published: 6 July 2018
PDF Full-text (3622 KB) | HTML Full-text | XML Full-text
Abstract
The explosion of genomic data provides new opportunities to improve the task of gene regulatory network reconstruction. Because of its inherent probability character, the Bayesian network is one of the most promising methods. However, excessive computation time and the requirements of a large
[...] Read more.
The explosion of genomic data provides new opportunities to improve the task of gene regulatory network reconstruction. Because of its inherent probability character, the Bayesian network is one of the most promising methods. However, excessive computation time and the requirements of a large number of biological samples reduce its effectiveness and application to gene regulatory network reconstruction. In this paper, Flooding-Pruning Hill-Climbing algorithm (FPHC) is proposed as a novel hybrid method based on Bayesian networks for gene regulatory networks reconstruction. On the basis of our previous work, we propose the concept of DPI Level based on data processing inequality (DPI) to better identify neighbors of each gene on the lack of enough biological samples. Then, we use the search-and-score approach to learn the final network structure in the restricted search space. We first analyze and validate the effectiveness of FPHC in theory. Then, extensive comparison experiments are carried out on known Bayesian networks and biological networks from the DREAM (Dialogue on Reverse Engineering Assessment and Methods) challenge. The results show that the FPHC algorithm, under recommended parameters, outperforms, on average, the original hill climbing and Max-Min Hill-Climbing (MMHC) methods with respect to the network structure and running time. In addition, our results show that FPHC is more suitable for gene regulatory network reconstruction with limited data. Full article
Figures

Figure 1

Open AccessArticle DynSig: Modelling Dynamic Signaling Alterations along Gene Pathways for Identifying Differential Pathways
Received: 12 May 2018 / Revised: 25 June 2018 / Accepted: 25 June 2018 / Published: 27 June 2018
PDF Full-text (2398 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Although a number of methods have been proposed for identifying differentially expressed pathways (DEPs), few efforts consider the dynamic components of pathway networks, i.e., gene links. We here propose a signaling dynamics detection method for identification of DEPs, DynSig, which detects the molecular
[...] Read more.
Although a number of methods have been proposed for identifying differentially expressed pathways (DEPs), few efforts consider the dynamic components of pathway networks, i.e., gene links. We here propose a signaling dynamics detection method for identification of DEPs, DynSig, which detects the molecular signaling changes in cancerous cells along pathway topology. Specifically, DynSig relies on gene links, instead of gene nodes, in pathways, and models the dynamic behavior of pathways based on Markov chain model (MCM). By incorporating the dynamics of molecular signaling, DynSig allows for an in-depth characterization of pathway activity. To identify DEPs, a novel statistic of activity alteration of pathways was formulated as an overall signaling perturbation score between sample classes. Experimental results on both simulation and real-world datasets demonstrate the effectiveness and efficiency of the proposed method in identifying differential pathways. Full article
Figures

Figure 1

Open AccessArticle Integrative Analysis of Dysregulated lncRNA-Associated ceRNA Network Reveals Functional lncRNAs in Gastric Cancer
Received: 3 May 2018 / Revised: 28 May 2018 / Accepted: 12 June 2018 / Published: 18 June 2018
PDF Full-text (2508 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Mounting evidence suggests that long noncoding RNAs (lncRNAs) play important roles in the regulation of gene expression by acting as competing endogenous RNA (ceRNA). However, the regulatory mechanisms of lncRNA as ceRNA in gastric cancer (GC) are not fully understood. Here, we first
[...] Read more.
Mounting evidence suggests that long noncoding RNAs (lncRNAs) play important roles in the regulation of gene expression by acting as competing endogenous RNA (ceRNA). However, the regulatory mechanisms of lncRNA as ceRNA in gastric cancer (GC) are not fully understood. Here, we first constructed a dysregulated lncRNA-associated ceRNA network by integrating analysis of gene expression profiles of lncRNAs, microRNAs (miRNAs), and messenger RNAs (mRNAs). Then, we determined three lncRNAs (RP5-1120P11, DLEU2, and DDX11-AS1) as hub lncRNAs, in which associated ceRNA subnetworks were involved in cell cycle-related processes and cancer-related pathways. Furthermore, we confirmed that the two lncRNAs (DLEU2 and DDX11-AS1) were significantly upregulated in GC tissues, promote GC cell proliferation, and negatively regulate miRNA expression, respectively. The hub lncRNAs (DLEU2 and DDX11-AS1) could have oncogenic functions, and act as potential ceRNAs to sponge miRNA. Our findings not only provide novel insights on ceRNA regulation in GC, but can also provide opportunities for the functional characterization of lncRNAs in future studies. Full article
Figures

Figure 1

Open AccessArticle Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE
Received: 25 April 2018 / Revised: 30 May 2018 / Accepted: 6 June 2018 / Published: 15 June 2018
PDF Full-text (1110 KB) | HTML Full-text | XML Full-text
Abstract
Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and
[...] Read more.
Feature selection, which identifies a set of most informative features from the original feature space, has been widely used to simplify the predictor. Recursive feature elimination (RFE), as one of the most popular feature selection approaches, is effective in data dimension reduction and efficiency increase. A ranking of features, as well as candidate subsets with the corresponding accuracy, is produced through RFE. The subset with highest accuracy (HA) or a preset number of features (PreNum) are often used as the final subset. However, this may lead to a large number of features being selected, or if there is no prior knowledge about this preset number, it is often ambiguous and subjective regarding final subset selection. A proper decision variant is in high demand to automatically determine the optimal subset. In this study, we conduct pioneering work to explore the decision variant after obtaining a list of candidate subsets from RFE. We provide a detailed analysis and comparison of several decision variants to automatically select the optimal feature subset. Random forest (RF)-recursive feature elimination (RF-RFE) algorithm and a voting strategy are introduced. We validated the variants on two totally different molecular biology datasets, one for a toxicogenomic study and the other one for protein sequence analysis. The study provides an automated way to determine the optimal feature subset when using RF-RFE. Full article
Figures

Figure 1

Open AccessArticle RECTA: Regulon Identification Based on Comparative Genomics and Transcriptomics Analysis
Received: 30 March 2018 / Revised: 19 May 2018 / Accepted: 25 May 2018 / Published: 30 May 2018
PDF Full-text (846 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Regulons, which serve as co-regulated gene groups contributing to the transcriptional regulation of microbial genomes, have the potential to aid in understanding of underlying regulatory mechanisms. In this study, we designed a novel computational pipeline, regulon identification based on comparative genomics and transcriptomics
[...] Read more.
Regulons, which serve as co-regulated gene groups contributing to the transcriptional regulation of microbial genomes, have the potential to aid in understanding of underlying regulatory mechanisms. In this study, we designed a novel computational pipeline, regulon identification based on comparative genomics and transcriptomics analysis (RECTA), for regulon prediction related to the gene regulatory network under certain conditions. To demonstrate the effectiveness of this tool, we implemented RECTA on Lactococcus lactis MG1363 data to elucidate acid-response regulons. A total of 51 regulons were identified, 14 of which have computational-verified significance. Among these 14 regulons, five of them were computationally predicted to be connected with acid stress response. Validated by literature, 33 genes in Lactococcus lactis MG1363 were found to have orthologous genes which were associated with six regulons. An acid response related regulatory network was constructed, involving two trans-membrane proteins, eight regulons (llrA, llrC, hllA, ccpA, NHP6A, rcfB, regulons #8 and #39), nine functional modules, and 33 genes with orthologous genes known to be associated with acid stress. The predicted response pathways could serve as promising candidates for better acid tolerance engineering in Lactococcus lactis. Our RECTA pipeline provides an effective way to construct a reliable gene regulatory network through regulon elucidation, and has strong application power and can be effectively applied to other bacterial genomes where the elucidation of the transcriptional regulation network is needed. Full article
Figures

Figure 1

Open AccessArticle The Cross-Entropy Based Multi-Filter Ensemble Method for Gene Selection
Received: 12 March 2018 / Revised: 20 April 2018 / Accepted: 2 May 2018 / Published: 17 May 2018
PDF Full-text (688 KB) | HTML Full-text | XML Full-text
Abstract
The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE)
[...] Read more.
The gene expression profile has the characteristics of a high dimension, low sample, and continuous type, and it is a great challenge to use gene expression profile data for the classification of tumor samples. This paper proposes a cross-entropy based multi-filter ensemble (CEMFE) method for microarray data classification. Firstly, multiple filters are used to select the microarray data in order to obtain a plurality of the pre-selected feature subsets with a different classification ability. The top N genes with the highest rank of each subset are integrated so as to form a new data set. Secondly, the cross-entropy algorithm is used to remove the redundant data in the data set. Finally, the wrapper method, which is based on forward feature selection, is used to select the best feature subset. The experimental results show that the proposed method is more efficient than other gene selection methods and that it can achieve a higher classification accuracy under fewer characteristic genes. Full article
Figures

Figure 1

Open AccessArticle A Novel Hybrid Sequence-Based Model for Identifying Anticancer Peptides
Received: 24 January 2018 / Revised: 14 February 2018 / Accepted: 27 February 2018 / Published: 13 March 2018
Cited by 2 | PDF Full-text (3848 KB) | HTML Full-text | XML Full-text
Abstract
Cancer is a serious health issue worldwide. Traditional treatment methods focus on killing cancer cells by using anticancer drugs or radiation therapy, but the cost of these methods is quite high, and in addition there are side effects. With the discovery of anticancer
[...] Read more.
Cancer is a serious health issue worldwide. Traditional treatment methods focus on killing cancer cells by using anticancer drugs or radiation therapy, but the cost of these methods is quite high, and in addition there are side effects. With the discovery of anticancer peptides, great progress has been made in cancer treatment. For the purpose of prompting the application of anticancer peptides in cancer treatment, it is necessary to use computational methods to identify anticancer peptides (ACPs). In this paper, we propose a sequence-based model for identifying ACPs (SAP). In our proposed SAP, the peptide is represented by 400D features or 400D features with g-gap dipeptide features, and then the unrelated features are pruned using the maximum relevance-maximum distance method. The experimental results demonstrate that our model performs better than some existing methods. Furthermore, our model has also been extended to other classifiers, and the performance is stable compared with some state-of-the-art works. Full article
Figures

Figure 1

Open AccessArticle PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection
Received: 21 December 2017 / Revised: 24 January 2018 / Accepted: 8 February 2018 / Published: 14 February 2018
PDF Full-text (741 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Protein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus.
[...] Read more.
Protein quaternary structure complex is also known as a multimer, which plays an important role in a cell. The dimer structure of transcription factors is involved in gene regulation, but the trimer structure of virus-infection-associated glycoproteins is related to the human immunodeficiency virus. The classification of the protein quaternary structure complex for the post-genome era of proteomics research will be of great help. Classification systems among protein quaternary structures have not been widely developed. Therefore, we designed the architecture of a two-layer machine learning technique in this study, and developed the classification system PClass. The protein quaternary structure of the complex is divided into five categories, namely, monomer, dimer, trimer, tetramer, and other subunit classes. In the framework of the bootstrap method with a support vector machine, we propose a new model selection method. Each type of complex is classified based on sequences, entropy, and accessible surface area, thereby generating a plurality of feature modules. Subsequently, the optimal model of effectiveness is selected as each kind of complex feature module. In this stage, the optimal performance can reach as high as 70% of Matthews correlation coefficient (MCC). The second layer of construction combines the first-layer module to integrate mechanisms and the use of six machine learning methods to improve the prediction performance. This system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system using transcription factors in dimer structure and virus-infection-associated glycoprotein in trimer structure. PClass is available via a web interface at http://predictor.nchu.edu.tw/PClass/. Full article
Figures

Figure 1

Open AccessArticle MiR-93-5p Promotes Cell Proliferation through Down-Regulating PPARGC1A in Hepatocellular Carcinoma Cells by Bioinformatics Analysis and Experimental Verification
Received: 7 December 2017 / Revised: 15 January 2018 / Accepted: 16 January 2018 / Published: 22 January 2018
Cited by 1 | PDF Full-text (4456 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Peroxisome proliferator-activated receptor gamma coactivator-1 alpha (PPARGC1A, formerly known as PGC-1a) is a transcriptional coactivator and metabolic regulator. Previous studies are mainly focused on the association between PPARGC1A and hepatoma. However, the regulatory mechanism remains unknown. A microRNA associated with cancer (oncomiR), miR-93-5p,
[...] Read more.
Peroxisome proliferator-activated receptor gamma coactivator-1 alpha (PPARGC1A, formerly known as PGC-1a) is a transcriptional coactivator and metabolic regulator. Previous studies are mainly focused on the association between PPARGC1A and hepatoma. However, the regulatory mechanism remains unknown. A microRNA associated with cancer (oncomiR), miR-93-5p, has recently been found to play an essential role in tumorigenesis and progression of various carcinomas, including liver cancer. Therefore, this paper aims to explore the regulatory mechanism underlying these two proteins in hepatoma cells. Firstly, an integrative analysis was performed with miRNA–mRNA modules on microarray and The Cancer Genome Atlas (TCGA) data and obtained the core regulatory network and miR-93-5p/PPARGC1A pair. Then, a series of experiments were conducted in hepatoma cells with the results including miR-93-5p upregulated and promoted cell proliferation. Thirdly, the inverse correlation between miR-93-5p and PPARGC1A expression was validated. Finally, we inferred that miR-93-5p plays an essential role in inhibiting PPARGC1A expression by directly targeting the 3′-untranslated region (UTR) of its mRNA. In conclusion, these results suggested that miR-93-5p overexpression contributes to hepatoma development by inhibiting PPARGC1A. It is anticipated to be a promising therapeutic strategy for patients with liver cancer in the future. Full article
Figures

Figure 1

Back to Top