E-Mail Alert

Add your e-mail address to receive forthcoming issues of this journal:

Journal Browser

Journal Browser

Special Issue "Selected Papers from the Second CCF Bioinformatics Conference (CBC 2017)"

A special issue of Molecules (ISSN 1420-3049).

Deadline for manuscript submissions: closed (10 November 2017)

Special Issue Editors

Guest Editor
Prof. Dr. Min Li

School of Information Science and Engineering, Central South University, Changsha 410083, China
Website | E-Mail
Interests: bioinformatics; systems biology; genomic and proteomic data analysis; biological network analysis; disease association prediction
Guest Editor
Prof. Dr. Quan Zou

School of Computer Science and Technology, Tianjin University, Tianjin 300350, China
Website | E-Mail
Interests: bioinformatics; protein structure prediction; protein-protein interaction; special protein identification; machine learning; noncoding RNA

Special Issue Information

Dear Colleagues,

The Second CCF Bioinformatics Conference (CBC2017), organized by the China Computer Federation, will be held in Changsha, China, 13–15 October, 2017. The conference is supported and sponsored by Central South University, WeGene Company (Shenzhen), Sugon Information Industry Co., Beijing Zhongkejingyun Co., and the National Natural Science Foundation of China (NSFC).

Bioinformatics have become intensive research topics in the past decade and have attracted many leading scientists working in Biology, Physics, Mathematics, and Computer Science. Optimization, statistics, algorithms, and many other informatic methods are widely used in the field.

Following the successful CBC conference series from 2016, the purpose of CBC 2017 is to extend the international forum for scientists, researchers, educators, and practitioners to exchange ideas and approaches, to present research findings and state-of-the-art solutions in this interdisciplinary field, including theoretical methodology developments and their applications in biosciences and research on various aspects of bioinformatics. Excellent speakers in China will present their results. For all details, please see http://bioinformatics.csu.edu.cn/resources/CBC2017/, where a full list of presenters is available.

Prof. Dr. Min Li
Prof. Dr. Quan Zou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Molecules is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • bioinformatics

  • machine learning

  • system biology

  • biological networks

  • computational biology

Published Papers (12 papers)

View options order results:
result details:
Displaying articles 1-12
Export citation of selected articles as:

Research

Open AccessArticle The Integrative Method Based on the Module-Network for Identifying Driver Genes in Cancer Subtypes
Molecules 2018, 23(2), 183; doi:10.3390/molecules23020183
Received: 7 November 2017 / Revised: 29 December 2017 / Accepted: 8 January 2018 / Published: 24 January 2018
PDF Full-text (704 KB) | HTML Full-text | XML Full-text
Abstract
With advances in next-generation sequencing(NGS) technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast
[...] Read more.
With advances in next-generation sequencing(NGS) technologies, a large number of multiple types of high-throughput genomics data are available. A great challenge in exploring cancer progression is to identify the driver genes from the variant genes by analyzing and integrating multi-types genomics data. Breast cancer is known as a heterogeneous disease. The identification of subtype-specific driver genes is critical to guide the diagnosis, assessment of prognosis and treatment of breast cancer. We developed an integrated frame based on gene expression profiles and copy number variation (CNV) data to identify breast cancer subtype-specific driver genes. In this frame, we employed statistical machine-learning method to select gene subsets and utilized an module-network analysis method to identify potential candidate driver genes. The final subtype-specific driver genes were acquired by paired-wise comparison in subtypes. To validate specificity of the driver genes, the gene expression data of these genes were applied to classify the patient samples with 10-fold cross validation and the enrichment analysis were also conducted on the identified driver genes. The experimental results show that the proposed integrative method can identify the potential driver genes and the classifier with these genes acquired better performance than with genes identified by other methods. Full article
Figures

Figure 1

Open AccessArticle Selecting Feature Subsets Based on SVM-RFE and the Overlapping Ratio with Applications in Bioinformatics
Molecules 2018, 23(1), 52; doi:10.3390/molecules23010052
Received: 10 November 2017 / Revised: 15 December 2017 / Accepted: 16 December 2017 / Published: 26 December 2017
PDF Full-text (225 KB) | HTML Full-text | XML Full-text
Abstract
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in
[...] Read more.
Feature selection is an important topic in bioinformatics. Defining informative features from complex high dimensional biological data is critical in disease study, drug development, etc. Support vector machine-recursive feature elimination (SVM-RFE) is an efficient feature selection technique that has shown its power in many applications. It ranks the features according to the recursive feature deletion sequence based on SVM. In this study, we propose a method, SVM-RFE-OA, which combines the classification accuracy rate and the average overlapping ratio of the samples to determine the number of features to be selected from the feature rank of SVM-RFE. Meanwhile, to measure the feature weights more accurately, we propose a modified SVM-RFE-OA (M-SVM-RFE-OA) algorithm that temporally screens out the samples lying in a heavy overlapping area in each iteration. The experiments on the eight public biological datasets show that the discriminative ability of the feature subset could be measured more accurately by combining the classification accuracy rate with the average overlapping degree of the samples compared with using the classification accuracy rate alone, and shielding the samples in the overlapping area made the calculation of the feature weights more stable and accurate. The methods proposed in this study can also be used with other RFE techniques to define potential biomarkers from big biological data. Full article
Open AccessArticle Extracting Fitness Relationships and Oncogenic Patterns among Driver Genes in Cancer
Molecules 2018, 23(1), 39; doi:10.3390/molecules23010039
Received: 9 November 2017 / Revised: 13 December 2017 / Accepted: 18 December 2017 / Published: 25 December 2017
PDF Full-text (3181 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Driver mutation provides fitness advantage to cancer cells, the accumulation of which increases the fitness of cancer cells and accelerates cancer progression. This work seeks to extract patterns accumulated by driver genes (“fitness relationships”) in tumorigenesis. We introduce a network-based method for extracting
[...] Read more.
Driver mutation provides fitness advantage to cancer cells, the accumulation of which increases the fitness of cancer cells and accelerates cancer progression. This work seeks to extract patterns accumulated by driver genes (“fitness relationships”) in tumorigenesis. We introduce a network-based method for extracting the fitness relationships of driver genes by modeling the network properties of the “fitness” of cancer cells. Colon adenocarcinoma (COAD) and skin cutaneous malignant melanoma (SKCM) are employed as case studies. Consistent results derived from different background networks suggest the reliability of the identified fitness relationships. Additionally co-occurrence analysis and pathway analysis reveal the functional significance of the fitness relationships with signaling transduction. In addition, a subset of driver genes called the “fitness core” is recognized for each case. Further analyses indicate the functional importance of the fitness core in carcinogenesis, and provide potential therapeutic opportunities in medicinal intervention. Fitness relationships characterize the functional continuity among driver genes in carcinogenesis, and suggest new insights in understanding the oncogenic mechanisms of cancers, as well as providing guiding information for medicinal intervention. Full article
Figures

Figure 1

Open AccessArticle HIGA: A Running History Information Guided Genetic Algorithm for Protein–Ligand Docking
Molecules 2017, 22(12), 2233; doi:10.3390/molecules22122233
Received: 8 November 2017 / Revised: 3 December 2017 / Accepted: 12 December 2017 / Published: 15 December 2017
PDF Full-text (2204 KB) | HTML Full-text | XML Full-text
Abstract
Protein-ligand docking is an essential part of computer-aided drug design, and it identifies the binding patterns of proteins and ligands by computer simulation. Though Lamarckian genetic algorithm (LGA) has demonstrated excellent performance in terms of protein-ligand docking problems, it can not memorize the
[...] Read more.
Protein-ligand docking is an essential part of computer-aided drug design, and it identifies the binding patterns of proteins and ligands by computer simulation. Though Lamarckian genetic algorithm (LGA) has demonstrated excellent performance in terms of protein-ligand docking problems, it can not memorize the history information that it has accessed, rendering it effort-consuming to discover some promising solutions. This article illustrates a novel optimization algorithm (HIGA), which is based on LGA for solving the protein-ligand docking problems with an aim to overcome the drawback mentioned above. A running history information guided model, which includes CE crossover, ED mutation, and BSP tree, is applied in the method. The novel algorithm is more efficient to find the lowest energy of protein-ligand docking. We evaluate the performance of HIGA in comparison with GA, LGA, EDGA, CEPGA, SODOCK, and ABC, the results of which indicate that HIGA outperforms other search algorithms. Full article
Figures

Figure 1

Open AccessArticle Multi-Objective Optimization Algorithm to Discover Condition-Specific Modules in Multiple Networks
Molecules 2017, 22(12), 2228; doi:10.3390/molecules22122228
Received: 27 October 2017 / Revised: 10 December 2017 / Accepted: 11 December 2017 / Published: 14 December 2017
PDF Full-text (324 KB) | HTML Full-text | XML Full-text
Abstract
The advances in biological technologies make it possible to generate data for multiple conditions simultaneously. Discovering the condition-specific modules in multiple networks has great merit in understanding the underlying molecular mechanisms of cells. The available algorithms transform the multiple networks into a single
[...] Read more.
The advances in biological technologies make it possible to generate data for multiple conditions simultaneously. Discovering the condition-specific modules in multiple networks has great merit in understanding the underlying molecular mechanisms of cells. The available algorithms transform the multiple networks into a single objective optimization problem, which is criticized for its low accuracy. To address this issue, a multi-objective genetic algorithm for condition-specific modules in multiple networks (MOGA-CSM) is developed to discover the condition-specific modules. By using the artificial networks, we demonstrate that the MOGA-CSM outperforms state-of-the-art methods in terms of accuracy. Furthermore, MOGA-CSM discovers stage-specific modules in breast cancer networks based on The Cancer Genome Atlas (TCGA) data, and these modules serve as biomarkers to predict stages of breast cancer. The proposed model and algorithm provide an effective way to analyze multiple networks. Full article
Figures

Figure 1

Open AccessArticle Developing an Agent-Based Drug Model to Investigate the Synergistic Effects of Drug Combinations
Molecules 2017, 22(12), 2209; doi:10.3390/molecules22122209
Received: 27 October 2017 / Revised: 6 December 2017 / Accepted: 7 December 2017 / Published: 14 December 2017
PDF Full-text (1853 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
The growth and survival of cancer cells are greatly related to their surrounding microenvironment. To understand the regulation under the impact of anti-cancer drugs and their synergistic effects, we have developed a multiscale agent-based model that can investigate the synergistic effects of drug
[...] Read more.
The growth and survival of cancer cells are greatly related to their surrounding microenvironment. To understand the regulation under the impact of anti-cancer drugs and their synergistic effects, we have developed a multiscale agent-based model that can investigate the synergistic effects of drug combinations with three innovations. First, it explores the synergistic effects of drug combinations in a huge dose combinational space at the cell line level. Second, it can simulate the interaction between cells and their microenvironment. Third, it employs both local and global optimization algorithms to train the key parameters and validate the predictive power of the model by using experimental data. The research results indicate that our multicellular system can not only describe the interactions between the microenvironment and cells in detail, but also predict the synergistic effects of drug combinations. Full article
Figures

Figure 1

Open AccessArticle Detection of Network Motif Based on a Novel Graph Canonization Algorithm from Transcriptional Regulation Networks
Molecules 2017, 22(12), 2194; doi:10.3390/molecules22122194
Received: 4 November 2017 / Revised: 28 November 2017 / Accepted: 5 December 2017 / Published: 10 December 2017
PDF Full-text (4366 KB) | HTML Full-text | XML Full-text
Abstract
Network motifs are patterns of complex networks occurring significantly more frequently than those in random networks. They have been considered as fundamental building blocks of complex networks. Therefore, the detection of network motifs in transcriptional regulation networks is a crucial step in understanding
[...] Read more.
Network motifs are patterns of complex networks occurring significantly more frequently than those in random networks. They have been considered as fundamental building blocks of complex networks. Therefore, the detection of network motifs in transcriptional regulation networks is a crucial step in understanding the mechanism of transcriptional regulation and network evolution. The search for network motifs is similar to solving subgraph searching problems, which has proven to be NP-complete. To quickly and effectively count subgraphs of a large biological network, we propose a novel graph canonization algorithm based on resolving sets. This method has been implemented in a command line interface (CLI) program sgip using the SeqAn library. Comparing to Babai’s algorithm, this approach has a tighter complexity bound, o ( exp ( n log 2 n + 4 log n ) ) , on strongly regular graphs. Results on several simulated datasets and transcriptional regulation networks indicate that sgip outperforms nauty on many graph cases. The source code of sgip is freely accessible in https://github.com/seqan/seqan/tree/master/apps/sgip and the binary code in http://packages.seqan.de/sgip/. Full article
Figures

Figure 1

Open AccessArticle A Seed Expansion Graph Clustering Method for Protein Complexes Detection in Protein Interaction Networks
Molecules 2017, 22(12), 2179; doi:10.3390/molecules22122179
Received: 9 November 2017 / Revised: 3 December 2017 / Accepted: 3 December 2017 / Published: 8 December 2017
PDF Full-text (2297 KB) | HTML Full-text | XML Full-text
Abstract
Most proteins perform their biological functions while interacting as complexes. The detection of protein complexes is an important task not only for understanding the relationship between functions and structures of biological network, but also for predicting the function of unknown proteins. We present
[...] Read more.
Most proteins perform their biological functions while interacting as complexes. The detection of protein complexes is an important task not only for understanding the relationship between functions and structures of biological network, but also for predicting the function of unknown proteins. We present a new nodal metric by integrating its local topological information. The metric reflects its representability in a larger local neighborhood to a cluster of a protein interaction (PPI) network. Based on the metric, we propose a seed-expansion graph clustering algorithm (SEGC) for protein complexes detection in PPI networks. A roulette wheel strategy is used in the selection of the seed to enhance the diversity of clustering. For a candidate node u, we define its closeness to a cluster C, denoted as NC(u, C), by combing the density of a cluster C and the connection between a node u and C. In SEGC, a cluster which initially consists of only a seed node, is extended by adding nodes recursively from its neighbors according to the closeness, until all neighbors fail the process of expansion. We compare the F-measure and accuracy of the proposed SEGC algorithm with other algorithms on Saccharomyces cerevisiae protein interaction networks. The experimental results show that SEGC outperforms other algorithms under full coverage. Full article
Figures

Figure 1

Open AccessArticle A Robust Manifold Graph Regularized Nonnegative Matrix Factorization Algorithm for Cancer Gene Clustering
Molecules 2017, 22(12), 2131; doi:10.3390/molecules22122131
Received: 27 October 2017 / Revised: 27 November 2017 / Accepted: 29 November 2017 / Published: 2 December 2017
PDF Full-text (1541 KB) | HTML Full-text | XML Full-text
Abstract
Detecting genomes with similar expression patterns using clustering techniques plays an important role in gene expression data analysis. Non-negative matrix factorization (NMF) is an effective method for clustering the analysis of gene expression data. However, the NMF-based method is performed within the Euclidean
[...] Read more.
Detecting genomes with similar expression patterns using clustering techniques plays an important role in gene expression data analysis. Non-negative matrix factorization (NMF) is an effective method for clustering the analysis of gene expression data. However, the NMF-based method is performed within the Euclidean space, and it is usually inappropriate for revealing the intrinsic geometric structure of data space. In order to overcome this shortcoming, Cai et al. proposed a novel algorithm, called graph regularized non-negative matrices factorization (GNMF). Motivated by the topological structure of the GNMF-based method, we propose improved graph regularized non-negative matrix factorization (GNMF) to facilitate the display of geometric structure of data space. Robust manifold non-negative matrix factorization (RM-GNMF) is designed for cancer gene clustering, leading to an enhancement of the GNMF-based algorithm in terms of robustness. We combine the l 2 , 1 -norm NMF with spectral clustering to conduct the wide-ranging experiments on the three known datasets. Clustering results indicate that the proposed method outperforms the previous methods, which displays the latest application of the RM-GNMF-based method in cancer gene clustering. Full article
Figures

Figure 1

Open AccessArticle An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer
Molecules 2017, 22(12), 2116; doi:10.3390/molecules22122116
Received: 25 October 2017 / Accepted: 29 November 2017 / Published: 1 December 2017
PDF Full-text (1825 KB) | HTML Full-text | XML Full-text
Abstract
Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing
[...] Read more.
Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion—a big data interface on the Tianhe-2 supercomputer—to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the “allocate-when-needed” paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2. Full article
Figures

Figure 1

Open AccessArticle Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony
Molecules 2017, 22(12), 2086; doi:10.3390/molecules22122086
Received: 27 October 2017 / Accepted: 23 November 2017 / Published: 29 November 2017
PDF Full-text (1092 KB) | HTML Full-text | XML Full-text
Abstract
Intelligent optimization algorithms have advantages in dealing with complex nonlinear problems accompanied by good flexibility and adaptability. In this paper, the FCBF (Fast Correlation-Based Feature selection) method is used to filter irrelevant and redundant features in order to improve the quality of cancer
[...] Read more.
Intelligent optimization algorithms have advantages in dealing with complex nonlinear problems accompanied by good flexibility and adaptability. In this paper, the FCBF (Fast Correlation-Based Feature selection) method is used to filter irrelevant and redundant features in order to improve the quality of cancer classification. Then, we perform classification based on SVM (Support Vector Machine) optimized by PSO (Particle Swarm Optimization) combined with ABC (Artificial Bee Colony) approaches, which is represented as PA-SVM. The proposed PA-SVM method is applied to nine cancer datasets, including five datasets of outcome prediction and a protein dataset of ovarian cancer. By comparison with other classification methods, the results demonstrate the effectiveness and the robustness of the proposed PA-SVM method in handling various types of data for cancer classification. Full article
Figures

Open AccessArticle Deep Convolutional Neural Network-Based Early Automated Detection of Diabetic Retinopathy Using Fundus Image
Molecules 2017, 22(12), 2054; doi:10.3390/molecules22122054
Received: 10 November 2017 / Revised: 20 November 2017 / Accepted: 22 November 2017 / Published: 23 November 2017
PDF Full-text (610 KB) | HTML Full-text | XML Full-text
Abstract
The automatic detection of diabetic retinopathy is of vital importance, as it is the main cause of irreversible vision loss in the working-age population in the developed world. The early detection of diabetic retinopathy occurrence can be very helpful for clinical treatment; although
[...] Read more.
The automatic detection of diabetic retinopathy is of vital importance, as it is the main cause of irreversible vision loss in the working-age population in the developed world. The early detection of diabetic retinopathy occurrence can be very helpful for clinical treatment; although several different feature extraction approaches have been proposed, the classification task for retinal images is still tedious even for those trained clinicians. Recently, deep convolutional neural networks have manifested superior performance in image classification compared to previous handcrafted feature-based image classification methods. Thus, in this paper, we explored the use of deep convolutional neural network methodology for the automatic classification of diabetic retinopathy using color fundus image, and obtained an accuracy of 94.5% on our dataset, outperforming the results obtained by using classical approaches. Full article
Figures

Figure 1

Back to Top