Selected Papers from the Third CFF Bioinformatics Conference (CBC2018)

A special issue of Genes (ISSN 2073-4425). This special issue belongs to the section "Technologies and Resources for Genetics".

Deadline for manuscript submissions: closed (30 November 2018) | Viewed by 44243

Special Issue Editor


E-Mail Website
Guest Editor
Institute of Computing Technology, Chinese Academy of Sciences Beijing, Beijing 100864, China
Interests: bioinformatics; protein structure prediction; electron tomography; molecules high-performance computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Bioinformatics have become an intensive research topic in the past decade and have attracted a great many leading scientists, working in biology, physics, mathematics and computer science. Especially, in computer science in China, there are many academics engaged in bioinformatics, where they are aimed at providing computing skills and applications for biological data mining and management, algorithm design and analysis, software development, etc. In this field, optimal, statistical, and many other computational methods and tools have been developed and are widely used. The goal of the CCF Bioinformatics Conference (CBC) is to provide a domestic forum for scientists, researchers, educators, and practitioners to exchange ideas and approaches, to present research findings and state-of-the-art solutions in this interdisciplinary field, and to also build a constructive dialogue between two research areas: Computer science and biology. The Third CCF Bioinformatics Conference (CBC 2018) will take place October 12–14, 2018, in Xi’an, Shaanxi, P.R. China. The conference is administered by the Bioinformatics Committee of CCF. The CBC 2018 is organized by Xidian University, and co-organized by Xi'an Jiaotong University. The website of CBC 2018 is  http://bioinformatics.xidian.edu.cn/cbc2018en/.

The conference will highlight challenges emerging from this encounter and solicit papers that propose state-of-the-art computational solutions to practical and theoretical issues that arise from the biological and medical fields.

Sincerely,

Prof. Fa Zhang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Genes is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Genomics and Epigenomics
  • Transcriptomics and Proteomics
  • Metagenomics and Metatranscriptomics
  • Genetics and Evolution
  • Structure and Functions of Non-coding RNA
  • Biological Networks and Systems Biology
  • Biomedical Information Extraction and Text Mining
  • Big Data Analytics in Biology and Precision medicine
  • Algorithms and High Performance Computing in Bioinformatics
  • Construction of Biological Databases and Bioinformatics Software Development

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

17 pages, 23480 KiB  
Article
Analysis of Topological Parameters of Complex Disease Genes Reveals the Importance of Location in a Biomolecular Network
by Xiaohui Zhao and Zhi-Ping Liu
Genes 2019, 10(2), 143; https://doi.org/10.3390/genes10020143 - 14 Feb 2019
Cited by 23 | Viewed by 3452
Abstract
Network biology and medicine provide unprecedented opportunities and challenges for deciphering disease mechanisms from integrative viewpoints. The disease genes and their products perform their dysfunctions via physical and biochemical interactions in the form of a molecular network. The topological parameters of these disease [...] Read more.
Network biology and medicine provide unprecedented opportunities and challenges for deciphering disease mechanisms from integrative viewpoints. The disease genes and their products perform their dysfunctions via physical and biochemical interactions in the form of a molecular network. The topological parameters of these disease genes in the interactome are of prominent interest to the understanding of their functionality from a systematic perspective. In this work, we provide a systems biology analysis of the topological features of complex disease genes in an integrated biomolecular network. Firstly, we identify the characteristics of four network parameters in the ten most frequently studied disease genes and identify several specific patterns of their topologies. Then, we confirm our findings in the other disease genes of three complex disorders (i.e., Alzheimer’s disease, diabetes mellitus, and hepatocellular carcinoma). The results reveal that the disease genes tend to have a higher betweenness centrality, a smaller average shortest path length, and a smaller clustering coefficient when compared to normal genes, whereas they have no significant degree prominence. The features highlight the importance of gene location in the integrated functional linkages. Full article
Show Figures

Figure 1

13 pages, 2276 KiB  
Article
Selecting Near-Native Protein Structures from Predicted Decoy Sets Using Ordered Graphlet Degree Similarity
by Xu Han, Li Li and Yonggang Lu
Genes 2019, 10(2), 132; https://doi.org/10.3390/genes10020132 - 11 Feb 2019
Cited by 2 | Viewed by 2513
Abstract
Effective prediction of protein tertiary structure from sequence is an important and challenging problem in computational structural biology. Ab initio protein structure prediction is based on amino acid sequence alone, thus, it has a wide application area. With the ab initio method, a [...] Read more.
Effective prediction of protein tertiary structure from sequence is an important and challenging problem in computational structural biology. Ab initio protein structure prediction is based on amino acid sequence alone, thus, it has a wide application area. With the ab initio method, a large number of candidate protein structures called decoy set can be predicted, however, it is a difficult problem to select a good near-native structure from the predicted decoy set. In this work we propose a new method for selecting the near-native structure from the decoy set based on both contact map overlap (CMO) and graphlets. By generalizing graphlets to ordered graphs, and using a dynamic programming to select the optimal alignment with an introduced gap penalty, a GR_score is defined for calculating the similarity between the three-dimensional (3D) decoy structures. The proposed method was applied to all 54 single-domain targets in CASP11 and all 43 targets in CASP10, and ensemble clustering was used to cluster the protein decoy structures based on the computed CR_scores. The most popular centroid structure was selected as the near-native structure. The experiments showed that compared to the SPICKER method, which is used in I-TASSER, the proposed method can usually select better near-native structures in terms of the similarity between the selected structure and the true native structure. Full article
Show Figures

Figure 1

13 pages, 2830 KiB  
Article
Self-Adjusting Ant Colony Optimization Based on Information Entropy for Detecting Epistatic Interactions
by Boxin Guan and Yuhai Zhao
Genes 2019, 10(2), 114; https://doi.org/10.3390/genes10020114 - 01 Feb 2019
Cited by 15 | Viewed by 3209
Abstract
The epistatic interactions of single nucleotide polymorphisms (SNPs) are considered to be an important factor in determining the susceptibility of individuals to complex diseases. Although many methods have been proposed to detect such interactions, the development of detection algorithm is still ongoing due [...] Read more.
The epistatic interactions of single nucleotide polymorphisms (SNPs) are considered to be an important factor in determining the susceptibility of individuals to complex diseases. Although many methods have been proposed to detect such interactions, the development of detection algorithm is still ongoing due to the computational burden in large-scale association studies. In this paper, to deal with the intensive computing problem of detecting epistatic interactions in large-scale datasets, a self-adjusting ant colony optimization based on information entropy (IEACO) is proposed. The algorithm can automatically self-adjust the path selection strategy according to the real-time information entropy. The performance of IEACO is compared with that of ant colony optimization (ACO), AntEpiSeeker, AntMiner, and epiACO on a set of simulated datasets and a real genome-wide dataset. The results of extensive experiments show that the proposed method is superior to the other methods. Full article
Show Figures

Figure 1

17 pages, 57429 KiB  
Article
A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data
by Xiaoshu Zhu, Hong-Dong Li, Yunpei Xu, Lilu Guo, Fang-Xiang Wu, Guihua Duan and Jianxin Wang
Genes 2019, 10(2), 98; https://doi.org/10.3390/genes10020098 - 29 Jan 2019
Cited by 14 | Viewed by 6434
Abstract
Single-cell RNA sequencing (scRNA-seq) has recently brought new insight into cell differentiation processes and functional variation in cell subtypes from homogeneous cell populations. A lack of prior knowledge makes unsupervised machine learning methods, such as clustering, suitable for analyzing scRNA-seq. However, there are [...] Read more.
Single-cell RNA sequencing (scRNA-seq) has recently brought new insight into cell differentiation processes and functional variation in cell subtypes from homogeneous cell populations. A lack of prior knowledge makes unsupervised machine learning methods, such as clustering, suitable for analyzing scRNA-seq. However, there are several limitations to overcome, including high dimensionality, clustering result instability, and parameter adjustment complexity. In this study, we propose a method by combining structure entropy and k nearest neighbor to identify cell subpopulations in scRNA-seq data. In contrast to existing clustering methods for identifying cell subtypes, minimized structure entropy results in natural communities without specifying the number of clusters. To investigate the performance of our model, we applied it to eight scRNA-seq datasets and compared our method with three existing methods (nonnegative matrix factorization, single-cell interpretation via multikernel learning, and structural entropy minimization principle). The experimental results showed that our approach achieves, on average, better performance in these datasets compared to the benchmark methods. Full article
Show Figures

Figure 1

11 pages, 1530 KiB  
Article
PVTree: A Sequential Pattern Mining Method for Alignment Independent Phylogeny Reconstruction
by Yongyong Kang, Xiaofei Yang, Jiadong Lin and Kai Ye
Genes 2019, 10(2), 73; https://doi.org/10.3390/genes10020073 - 22 Jan 2019
Cited by 6 | Viewed by 3499
Abstract
Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct [...] Read more.
Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity. Full article
Show Figures

Figure 1

10 pages, 1854 KiB  
Article
A Random Walk Based Cluster Ensemble Approach for Data Integration and Cancer Subtyping
by Chao Yang, Yu-Tian Wang and Chun-Hou Zheng
Genes 2019, 10(1), 66; https://doi.org/10.3390/genes10010066 - 18 Jan 2019
Cited by 4 | Viewed by 3675
Abstract
Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and [...] Read more.
Availability of diverse types of high-throughput data increases the opportunities for researchers to develop computational methods to provide a more comprehensive view for the mechanism and therapy of cancer. One fundamental goal for oncology is to divide patients into subtypes with clinical and biological significance. Cluster ensemble fits this task exactly. It can improve the performance and robustness of clustering results by combining multiple basic clustering results. However, many existing cluster ensemble methods use a co-association matrix to summarize the co-occurrence statistics of the instance-cluster, where the relationship in the integration is only encapsulated at a rough level. Moreover, the relationship among clusters is completely ignored. Finding these missing associations could greatly expand the ability of cluster ensemble methods for cancer subtyping. In this paper, we propose the RWCE (Random Walk based Cluster Ensemble) to consider similarity among clusters. We first obtained a refined similarity between clusters by using random walk and a scaled exponential similarity kernel. Then, after being modeled as a bipartite graph, a more informative instance-cluster association matrix filled with the aforementioned cluster similarity was fed into a spectral clustering algorithm to get the final clustering result. We applied our method on six cancer types from The Cancer Genome Atlas (TCGA) and breast cancer from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). Experimental results show that our method is competitive against existing methods. Further case study demonstrates that our method has the potential to find subtypes with clinical and biological significance. Full article
Show Figures

Figure 1

20 pages, 2927 KiB  
Article
A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction
by Lin Liu, Lin Tang, Xin Jin and Wei Zhou
Genes 2019, 10(1), 57; https://doi.org/10.3390/genes10010057 - 17 Jan 2019
Cited by 5 | Viewed by 3501
Abstract
With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet [...] Read more.
With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function. Full article
Show Figures

Figure 1

14 pages, 8131 KiB  
Article
Network Analyses of Integrated Differentially Expressed Genes in Papillary Thyroid Carcinoma to Identify Characteristic Genes
by Junliang Shang, Qian Ding, Shasha Yuan, Jin-Xing Liu, Feng Li and Honghai Zhang
Genes 2019, 10(1), 45; https://doi.org/10.3390/genes10010045 - 14 Jan 2019
Cited by 8 | Viewed by 3666
Abstract
Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. Identifying characteristic genes of PTC are of great importance to reveal its potential genetic mechanisms. In this paper, we proposed a framework, as well as a measure named Normalized Centrality Measure [...] Read more.
Papillary thyroid carcinoma (PTC) is the most common type of thyroid cancer. Identifying characteristic genes of PTC are of great importance to reveal its potential genetic mechanisms. In this paper, we proposed a framework, as well as a measure named Normalized Centrality Measure (NCM), to identify characteristic genes of PTC. The framework consisted of four steps. First, both up-regulated genes and down-regulated genes, collectively called differentially expressed genes (DEGs), were screened and integrated together from four datasets, that is, GSE3467, GSE3678, GSE33630, and GSE58545; second, an interaction network of DEGs was constructed, where each node represented a gene and each edge represented an interaction between linking nodes; third, both traditional measures and the NCM measure were used to analyze the topological properties of each node in the network. Compared with traditional measures, more genes related to PTC were identified by the NCM measure; fourth, by mining the high-density subgraphs of this network and performing Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis, several meaningful results were captured, most of which were demonstrated to be associated with PTC. The experimental results proved that this network framework and the NCM measure are useful for identifying more characteristic genes of PTC. Full article
Show Figures

Figure 1

10 pages, 1200 KiB  
Article
A Sequence-Based Novel Approach for Quality Evaluation of Third-Generation Sequencing Reads
by Wenjing Zhang, Neng Huang, Jiantao Zheng, Xingyu Liao, Jianxin Wang and Hong-Dong Li
Genes 2019, 10(1), 44; https://doi.org/10.3390/genes10010044 - 14 Jan 2019
Cited by 8 | Viewed by 3545
Abstract
The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of [...] Read more.
The advent of third-generation sequencing (TGS) technologies, such as the Pacific Biosciences (PacBio) and Oxford Nanopore machines, provides new possibilities for contig assembly, scaffolding, and high-performance computing in bioinformatics due to its long reads. However, the high error rate and poor quality of TGS reads provide new challenges for accurate genome assembly and long-read alignment. Efficient processing methods are in need to prioritize high-quality reads for improving the results of error correction and assembly. In this study, we proposed a novel Read Quality Evaluation and Selection Tool (REQUEST) for evaluating the quality of third-generation long reads. REQUEST generates training data of high-quality and low-quality reads which are characterized by their nucleotide combinations. A linear regression model was built to score the quality of reads. The method was tested on three datasets of different species. The results showed that the top-scored reads prioritized by REQUEST achieved higher alignment accuracies. The contig assembly results based on the top-scored reads also outperformed conventional approaches that use all reads. REQUEST is able to distinguish high-quality reads from low-quality ones without using reference genomes, making it a promising alternative sequence-quality evaluation method to alignment-based algorithms. Full article
Show Figures

Graphical abstract

15 pages, 2657 KiB  
Article
A Novel Method for Identifying Essential Genes by Fusing Dynamic Protein–Protein Interactive Networks
by Fengyu Zhang, Wei Peng, Yunfei Yang, Wei Dai and Junrong Song
Genes 2019, 10(1), 31; https://doi.org/10.3390/genes10010031 - 08 Jan 2019
Cited by 23 | Viewed by 3807
Abstract
Essential genes play an indispensable role in supporting the life of an organism. Identification of essential genes helps us to understand the underlying mechanism of cell life. The essential genes of bacteria are potential drug targets of some diseases genes. Recently, several computational [...] Read more.
Essential genes play an indispensable role in supporting the life of an organism. Identification of essential genes helps us to understand the underlying mechanism of cell life. The essential genes of bacteria are potential drug targets of some diseases genes. Recently, several computational methods have been proposed to detect essential genes based on the static protein–protein interactive (PPI) networks. However, these methods have ignored the fact that essential genes play essential roles under certain conditions. In this work, a novel method was proposed for the identification of essential proteins by fusing the dynamic PPI networks of different time points (called by FDP). Firstly, the active PPI networks of each time point were constructed and then they were fused into a final network according to the networks’ similarities. Finally, a novel centrality method was designed to assign each gene in the final network a ranking score, whilst considering its orthologous property and its global and local topological properties in the network. This model was applied on two different yeast data sets. The results showed that the FDP achieved a better performance in essential gene prediction as compared to other existing methods that are based on the static PPI network or that are based on dynamic networks. Full article
Show Figures

Figure 1

14 pages, 3222 KiB  
Article
Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies
by Yingjie Guo, Chenxi Wu, Maozu Guo, Xiaoyan Liu and Alon Keinan
Genes 2018, 9(12), 608; https://doi.org/10.3390/genes9120608 - 05 Dec 2018
Cited by 3 | Viewed by 2856
Abstract
Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the [...] Read more.
Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the form of the relationship between traits and single-nucleotide polymorphisms, which result in limited statistical power. In this paper, we propose a gene-based method based on the distance correlation coefficient called gene-based gene-gene interaction via distance correlation coefficient (GBDcor). The distance correlation (dCor) is a measurement of the dependency between two random vectors with arbitrary, and not necessarily equal, dimensions. We used the difference in dCor in case and control datasets as an indicator of gene–gene interaction, which was based on the assumption that the joint distribution of two genes in case subjects and in control subjects should not be significantly different if the two genes do not interact. We designed a permutation-based statistical test to evaluate the difference between dCor in cases and controls for a pair of genes, and we provided the p-value for the statistic to represent the significance of the interaction between the two genes. In experiments with both simulated and real-world data, our method outperformed previous approaches in detecting interactions accurately. Full article
Show Figures

Figure 1

14 pages, 800 KiB  
Article
Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data
by Na Yu, Ying-Lian Gao, Jin-Xing Liu, Junliang Shang, Rong Zhu and Ling-Yun Dai
Genes 2018, 9(12), 586; https://doi.org/10.3390/genes9120586 - 28 Nov 2018
Cited by 26 | Viewed by 3476
Abstract
Cancer genomic data contain views from different sources that provide complementary information about genetic activity. This provides a new way for cancer research. Feature selection and multi-view clustering are hot topics in bioinformatics, and they can make full use of complementary information to [...] Read more.
Cancer genomic data contain views from different sources that provide complementary information about genetic activity. This provides a new way for cancer research. Feature selection and multi-view clustering are hot topics in bioinformatics, and they can make full use of complementary information to improve the effect. In this paper, a novel integrated model called Multi-view Non-negative Matrix Factorization (MvNMF) is proposed for the selection of common differential genes (co-differential genes) and multi-view clustering. In order to encode the geometric information in the multi-view genomic data, graph regularized MvNMF (GMvNMF) is further proposed by applying the graph regularization constraint in the objective function. GMvNMF can not only obtain the potential shared feature structure and shared cluster group structure, but also capture the manifold structure of multi-view data. The validity of the proposed GMvNMF method was tested in four multi-view genomic data. Experimental results showed that the GMvNMF method has better performance than other representative methods. Full article
Show Figures

Figure 1

Back to TopTop