Special Issue "Developments in Bioinformatic Algorithms"
A special issue of Biology (ISSN 2079-7737).
Deadline for manuscript submissions: 31 July 2013
Prof. Dr. Sven Rahmann
Genome Informatics, Institute of Human Genetics, Faculty of Medicine, University of Duisburg-Essen, Germany
Interests: algorithmic bioinformatics; genome informatics; computational biology; next generation sequencing; resource-constrained data analysis; computational diagnostics and prognostics; biomarker discovery; network biology
Questions from molecular biology and genome research have inspired algorithm development in computer science for many years, starting with string algorithms for (small-scale) sequence analysis, and quickly extending to methods from graph and network algorithms, combinatorial optimization, computational statistics and image analysis, to name a few.
Today, new challenges arise from what is called the ``data deluge'' in biology: In some areas, such as next generation sequencing or bioimaging, the ability to produce data has grown much more rapidly than the ability to analyze and interpret the data sets. This situation calls for applying novel ideas from algorithm development to the datasets that each biology lab faces today.
This special issue calls for papers that illustrate this process, starting with a clearly motivated biological question, translating it into a formal problem, applying an appropriate algorithmic technique, and finally applying the solution to the original problem.
Relevant ideas from algorithmics include, but are not limited to, sublinear algorithms, streaming algorithms, compressed data structures, sampling techniques, or parallelization techniques.
Each submission should make an effort to present the algorithmic ideas first separately from their application. One goal of this special issue is to provide a broad overview of recent algorithmic developments in bioinformatics.
Prof. Dr. Sven Rahmann
Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. Papers will be published continuously (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.
Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are refereed through a peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biology is an international peer-reviewed Open Access quarterly journal published by MDPI.
Please visit the Instructions for Authors page before submitting a manuscript. For the first couple of issues the Article Processing Charge (APC) will be waived for well-prepared manuscripts. English correction and/or formatting fees of 250 CHF (Swiss Francs) will be charged in certain cases for those articles accepted for publication that require extensive additional formatting and/or English corrections.
- data deluge
- algorithm development
- algorithm engineering
The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.
Type of Paper: Article
Title: Local Similarity Search to Find Gene Indicators in Mitochondrial Genomes
Authors: R. L. V. Moritz, M. Bernt and M. Middendorf
Affiliation: Parallel Computing and Complex Systems, Faculty of Mathematics and Computer Science, Universität Leipzig; E-Mail: firstname.lastname@example.org (R.L.V.M.); email@example.com (M.B.); firstname.lastname@example.org (M.M.)
Abstract: Given a set of nucleotide sequences and corresponding gene annotations which might contain a moderate number of errors we consider the problem to identify common substrings occurring in homologous genes and to identify putative errors in the given annotations. The problem is solved by identifying nodes in a suffix tree that contains all substrings occurring in the given nucleotide sequences. Due to the large size of the targeted data set our approach employs a truncated version of suffix trees. The approach is successfully applied to the mitochondrial nucleotide sequences and the corresponding annotations that are available in RefSeq for the more than 2000 metazoan species. We demonstrate that the approach finds appropriate substrings despite of errors in the given annotations. Moreover, it has identified several hundred errors that occur within the RefSeq annotations.
Type of Paper: Review
Title: Algorithms for Computing the Triplet and Quartet Distance for General Trees
Authors: Gerth Stølting Brodahl 1,2, Morten Kragelund Holt 1,2, Jens Johansen 1,2, Rolf Fagerberg 3, Andreas Sand 4, Christian Nørgaard Storm Pedersen 4 and Thomas Mailund 4,*
Affiliations: 1. Department of Computer Science, Aarhus University, IT-Parken, Aabogade 34 DK-8200 Aarhus N, Denmark
2. MADALGO, Center for Massive Data Algorithms, a Center of the Danish National Research Foundation, Aabogade 34 DK-8200 Aarhus N, Denmark
3. Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55 DK-5230 Odense M, Denmark
4. Bioinformatics Research Centre, Aarhus University, C.F. Møllers All´e 8, DK-8000 Aarhus C, Denmark; Email: email@example.com (T.M.); Tel: +45 871 55562; Fax: +45 871 54102.
Abstract: Distance measures between trees are useful for comparing trees 1 in a systematic manner and several different distances measures have been proposed. The triplet and quartet distance are defined as the number of subsets of three or four leaves, respectively, where the topology of the induced sub-trees differs. These distances can trivially be computed by explicitly enumerating all sets of three or four leaves and testing if the topologies are different, but this leads to running times at least of the order n3 or n4 just for enumerating the sets. The different topologies can be counted implicitly, however, and in this paper we will review a series of algorithmic improvements that have been used during the last decode to develop more efficient algorithms by exploiting two different strategies for this; one based on based on dynamic programming and one based on colouring leaves in one tree and updating a hierarchical decomposition of the other.
Type of Paper: Article
Title: Algorithms for hidden Markov models restricted to occurrences of regular expressions
Authors: Paula Tataru 1, Andreas Sand 1, Asger Hobolth 1, Thomas Mailund 1 and Christian N. S. Pedersen 1,2,*
Affiliations: 1 Bioinformatics Research Centre, Aarhus University, C. F. Møllers All´e 8, DK-8000 Aarhus C, Denmark
2 Department of Computer Science, Aarhus University, Aabogade 34, DK-8200 Aarhus N, Denmark; E-Mail: firstname.lastname@example.org
Abstract: Hidden Markov models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model.
Type of Paper: Survey
Title: Algorithmic Perspectives of Network Transitive Reduction Problems and their Applications to Synthesis and Analysis of Biological Networks
Authors: Satabdi Aditya and Bhaskar DasGupta
Affiliation: Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60607-7052, USA; E-Mail: email@example.com
Abstract: In this survey paper we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a classic computational problem in combinatorial algorithms. We will subsequently consider three non-trivial extensions or generalizations of this problem motivated by applications in systems biology. We will then discuss the applications of these algorithmic methodologies in the context of three major biological research questions: synthesizing and simplifying signal transduction networks, drug target designs in disease networks, and measuring degeneracies and redundancies of biological networks.
Type of Paper: Review
Title: Software for the Integration of Pathway Data into Bioinformatic Algorithms
Authors: Frank Kramer, Michaela Bayerlova and Tim Beissbarth
Affiliations: University Medical Center Göttingen, Department of Medical Statistics, Humboldtallee 32, D-37073 Goettingen, Germany; E-Mails: firstname.lastname@example.org; email@example.com; firstname.lastname@example.org
Abstract: Easier access and decreased costs have lowered the entrance barrier for performing high-throughput experiments and lead to a "data deluge". This surge in generation of new data, both in vitro and in vivo, will naturally entail a surge in newly generated results as well. An important aspect to evaluate results of high-throughput experiments is access to pathway data within the scope of programming environments. This enables researchers to programmatically verify their results, for example by testing for overlaps of new results with available literature knowledge. Additionally, many bioinformatic algorithms can increase their power and robustness, if prior knowledge is directly integrated during the analysis. The three central aspects of software packages in this area are the source of the literature knowledge, the extend to which pathway data is available programmatically and possibilities to visualize the data. This manuscript will give an overview of tools and review advantages and drawbacks of approaches that allow users to integrate pathway data.
Last update: 18 June 2013