Special Issue "Developments in Bioinformatic Algorithms"

Quicklinks

A special issue of Biology (ISSN 2079-7737).

Deadline for manuscript submissions: closed (31 July 2013)

Special Issue Editor

Guest Editor
Prof. Dr. Sven Rahmann

Genome Informatics, Institute of Human Genetics, Faculty of Medicine, University of Duisburg-Essen, Germany
Website | E-Mail
Interests: algorithmic bioinformatics; genome informatics; computational biology; next generation sequencing; resource-constrained data analysis; computational diagnostics and prognostics; biomarker discovery; network biology

Special Issue Information

Dear Colleagues,

Questions from molecular biology and genome research have inspired algorithm development in computer science for many years, starting with string algorithms for (small-scale) sequence analysis, and quickly extending to methods from graph and network algorithms, combinatorial optimization, computational statistics and image analysis, to name a few.
Today, new challenges arise from what is called the ``data deluge'' in biology: In some areas, such as next generation sequencing or bioimaging, the ability to produce data has grown much more rapidly than the ability to analyze and interpret the data sets. This situation calls for applying novel ideas from algorithm development to the datasets that each biology lab faces today.
This special issue calls for papers that illustrate this process, starting with a clearly motivated biological question, translating it into a formal problem, applying an appropriate algorithmic technique, and finally applying the solution to the original problem.
Relevant ideas from algorithmics include, but are not limited to, sublinear algorithms, streaming algorithms, compressed data structures, sampling techniques, or parallelization techniques.
Each submission should make an effort to present the algorithmic ideas first separately from their application. One goal of this special issue is to provide a broad overview of recent algorithmic developments in bioinformatics.

Prof. Dr. Sven Rahmann
Guest editor

Submission

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. Papers will be published continuously (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are refereed through a peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biology is an international peer-reviewed Open Access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 600 CHF (Swiss Francs). English correction and/or formatting fees of 250 CHF (Swiss Francs) will be charged in certain cases for those articles accepted for publication that require extensive additional formatting and/or English corrections.

Keywords

  • data deluge
  • algorithm development
  • algorithm engineering

Published Papers (7 papers)

View options order results:
result details:
Displaying articles 1-7
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Local Similarity Search to Find Gene Indicators in Mitochondrial Genomes
Biology 2014, 3(1), 220-242; doi:10.3390/biology3010220
Received: 31 October 2013 / Revised: 15 February 2014 / Accepted: 18 February 2014 / Published: 11 March 2014
PDF Full-text (664 KB) | HTML Full-text | XML Full-text
Abstract
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given
[...] Read more.
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given nucleotide sequences. Due to the large size of the targeted data set, our approach employs a truncated version of suffix trees. Two methods for this task are introduced: (1) The annotation guided marker detection method uses gene annotations which might contain a moderate number of errors; (2) The probability based marker detection method determines sequences that appear significantly more often than expected. The approach is successfully applied to the mitochondrial nucleotide sequences, and the corresponding annotations that are available in RefSeq for 2989 metazoan species. We demonstrate that the approach finds appropriate substrings. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Open AccessArticle Portraying the Expression Landscapes of B-CellLymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes
Biology 2013, 2(4), 1411-1437; doi:10.3390/biology2041411
Received: 1 August 2013 / Revised: 1 October 2013 / Accepted: 5 November 2013 / Published: 2 December 2013
Cited by 4 | PDF Full-text (2597 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients.
[...] Read more.
We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Figures

Open AccessArticle Dynamic Programming Used to Align Protein Structures with a Spectrum Is Robust
Biology 2013, 2(4), 1296-1310; doi:10.3390/biology2041296
Received: 3 October 2013 / Revised: 23 October 2013 / Accepted: 8 November 2013 / Published: 20 November 2013
PDF Full-text (649 KB) | HTML Full-text | XML Full-text
Abstract
Several efficient algorithms to conduct pairwise comparisons among large databases of protein structures have emerged in the recent literature. The central theme is the design of a measure between the Cα atoms of two protein chains, from which dynamic programming is used
[...] Read more.
Several efficient algorithms to conduct pairwise comparisons among large databases of protein structures have emerged in the recent literature. The central theme is the design of a measure between the Cα atoms of two protein chains, from which dynamic programming is used to compute an alignment. The efficiency and efficacy of these algorithms allows large-scale computational studies that would have been previously impractical. The computational study herein shows that the structural alignment algorithm eigen-decomposition alignment with the spectrum (EIGAs) is robust against both parametric and structural variation. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Open AccessArticle Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
Biology 2013, 2(4), 1282-1295; doi:10.3390/biology2041282
Received: 28 June 2013 / Revised: 8 October 2013 / Accepted: 5 November 2013 / Published: 8 November 2013
PDF Full-text (344 KB) | HTML Full-text | XML Full-text
Abstract
Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed
[...] Read more.
Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)

Review

Jump to: Research

Open AccessReview R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms
Biology 2014, 3(1), 85-100; doi:10.3390/biology3010085
Received: 7 November 2013 / Revised: 29 November 2013 / Accepted: 31 January 2014 / Published: 7 February 2014
Cited by 6 | PDF Full-text (590 KB) | HTML Full-text | XML Full-text
Abstract
Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review,
[...] Read more.
Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, we examine software packages for the statistical computing framework R, which enable the integration of pathway data for further bioinformatic analyses. Different approaches to integrate and visualize pathway data are identified and packages are stratified concerning their features according to a number of different aspects: data import strategies, the extent of available data, dependencies on external tools, integration with further analysis steps and visualization options are considered. A total of 12 packages integrating pathway data are reviewed in this manuscript. These are supplemented by five R-specific packages for visualization and six connector packages, which provide access to external tools. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Figures

Open AccessReview Algorithmic Perspectives of Network Transitive Reduction Problems and their Applications to Synthesis and Analysis of Biological Networks
Biology 2014, 3(1), 1-21; doi:10.3390/biology3010001
Received: 19 July 2013 / Revised: 11 November 2013 / Accepted: 9 December 2013 / Published: 19 December 2013
PDF Full-text (361 KB) | HTML Full-text | XML Full-text
Abstract
In this survey paper, we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a
[...] Read more.
In this survey paper, we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a classic computational problem in combinatorial algorithms. We will subsequently consider a few non-trivial extensions or generalizations of this problem motivated by applications in systems biology. We will then discuss the applications of these algorithmic methodologies in the context of three major biological research questions: synthesizing and simplifying signal transduction networks, analyzing disease networks, and measuring redundancy of biological networks. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Open AccessReview Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees
Biology 2013, 2(4), 1189-1209; doi:10.3390/biology2041189
Received: 15 July 2013 / Revised: 29 August 2013 / Accepted: 13 September 2013 / Published: 26 September 2013
Cited by 1 | PDF Full-text (1353 KB) | HTML Full-text | XML Full-text
Abstract
Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four
[...] Read more.
Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four leaves, respectively, where the topologies of the induced subtrees differ. These distances can trivially be computed by explicitly enumerating all sets of three or four leaves and testing if the topologies are different, but this leads to time complexities at least of the order n3 or n4 just for enumerating the sets. The different topologies can be counte dimplicitly, however, and in this paper, we review a series of algorithmic improvements that have been used during the last decade to develop more efficient algorithms by exploiting two different strategies for this; one based on dynamic programming and another based oncoloring leaves in one tree and updating a hierarchical decomposition of the other. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)

Journal Contact

MDPI AG
Biology Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
biology@mdpi.com
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Biology
Back to Top