Special Issue "Developments in Bioinformatic Algorithms"

Quicklinks

A special issue of Biology (ISSN 2079-7737).

Deadline for manuscript submissions: 31 July 2013

Special Issue Editor

Guest Editor
Prof. Dr. Sven Rahmann
Genome Informatics, Institute of Human Genetics, Faculty of Medicine, University of Duisburg-Essen, Germany
Website: http://www.rahmannlab.de/people/rahmann
E-Mail: Sven.Rahmann@uni-due.de
Interests: algorithmic bioinformatics; genome informatics; computational biology; next generation sequencing; resource-constrained data analysis; computational diagnostics and prognostics; biomarker discovery; network biology

Special Issue Information

Dear Colleagues,

Questions from molecular biology and genome research have inspired algorithm development in computer science for many years, starting with string algorithms for (small-scale) sequence analysis, and quickly extending to methods from graph and network algorithms, combinatorial optimization, computational statistics and image analysis, to name a few.
Today, new challenges arise from what is called the ``data deluge'' in biology: In some areas, such as next generation sequencing or bioimaging, the ability to produce data has grown much more rapidly than the ability to analyze and interpret the data sets. This situation calls for applying novel ideas from algorithm development to the datasets that each biology lab faces today.
This special issue calls for papers that illustrate this process, starting with a clearly motivated biological question, translating it into a formal problem, applying an appropriate algorithmic technique, and finally applying the solution to the original problem.
Relevant ideas from algorithmics include, but are not limited to, sublinear algorithms, streaming algorithms, compressed data structures, sampling techniques, or parallelization techniques.
Each submission should make an effort to present the algorithmic ideas first separately from their application. One goal of this special issue is to provide a broad overview of recent algorithmic developments in bioinformatics.

Prof. Dr. Sven Rahmann
Guest editor

Submission

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. Papers will be published continuously (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are refereed through a peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biology is an international peer-reviewed Open Access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. For the first couple of issues the Article Processing Charge (APC) will be waived for well-prepared manuscripts. English correction and/or formatting fees of 250 CHF (Swiss Francs) will be charged in certain cases for those articles accepted for publication that require extensive additional formatting and/or English corrections.

Keywords

  • data deluge
  • algorithm development
  • algorithm engineering

Published Papers

No papers have been published in this special issue yet, see below for planned papers.

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Type of Paper: Article
Title: Local Similarity Search to Find Gene Indicators in Mitochondrial Genomes
Authors: R. L. V. Moritz, M. Bernt and M. Middendorf
Affiliation: Parallel Computing and Complex Systems, Faculty of Mathematics and Computer Science, Universität Leipzig; E-Mail: rmoritz@informatik.uni-leipzig.de (R.L.V.M.); bernt@informatik.uni-leipzig.de (M.B.); middendorf@informatik.uni-leipzig.de (M.M.)
Abstract: Given a set of nucleotide sequences and corresponding gene annotations which might contain a moderate number of errors we consider the problem to identify common substrings occurring in homologous genes and to identify putative errors in the given annotations. The problem is solved by identifying nodes in a suffix tree that contains all substrings occurring in the given nucleotide sequences. Due to the large size of the targeted data set our approach employs a truncated version of suffix trees. The approach is successfully applied to the mitochondrial nucleotide sequences and the corresponding annotations that are available in RefSeq for the more than 2000 metazoan species. We demonstrate that the approach finds appropriate substrings despite of errors in the given annotations. Moreover, it has identified several hundred errors that occur within the RefSeq annotations.

Type of Paper: Review
Title: Algorithms for Computing the Triplet and Quartet Distance for General Trees
Authors: Gerth Stølting Brodahl 1,2, Morten Kragelund Holt 1,2, Jens Johansen 1,2, Rolf Fagerberg 3, Andreas Sand 4, Christian Nørgaard Storm Pedersen 4 and Thomas Mailund 4,*
Affiliations: 1. Department of Computer Science, Aarhus University, IT-Parken, Aabogade 34 DK-8200 Aarhus N, Denmark
2. MADALGO, Center for Massive Data Algorithms, a Center of the Danish National Research Foundation, Aabogade 34 DK-8200 Aarhus N, Denmark
3. Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55 DK-5230 Odense M, Denmark
4. Bioinformatics Research Centre, Aarhus University, C.F. Møllers All´e 8, DK-8000 Aarhus C, Denmark; Email: mailund@birc.au.dk (T.M.); Tel: +45 871 55562; Fax: +45 871 54102.
Abstract: Distance measures between trees are useful for comparing trees 1 in a systematic manner and several different distances measures have been proposed. The triplet and quartet distance are defined as the number of subsets of three or four leaves, respectively, where the topology of the induced sub-trees differs. These distances can trivially be computed by explicitly enumerating all sets of three or four leaves and testing if the topologies are different, but this leads to running times at least of the order n3 or n4 just for enumerating the sets. The different topologies can be counted implicitly, however, and in this paper we will review a series of algorithmic improvements that have been used during the last decode to develop more efficient algorithms by exploiting two different strategies for this; one based on based on dynamic programming and one based on colouring leaves in one tree and updating a hierarchical decomposition of the other.

Type of Paper: Article
Title: Algorithms for hidden Markov models restricted to occurrences of regular expressions
Authors: Paula Tataru 1, Andreas Sand 1, Asger Hobolth 1, Thomas Mailund 1 and Christian N. S. Pedersen 1,2,*
Affiliations: 1 Bioinformatics Research Centre, Aarhus University, C. F. Møllers All´e 8, DK-8000 Aarhus C, Denmark
2 Department of Computer Science, Aarhus University, Aabogade 34, DK-8200 Aarhus N, Denmark; E-Mail: cstorm@birc.au.dk
Abstract: Hidden Markov models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model.

Last update: 15 May 2013

Biology EISSN 2079-7737 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert