Developments in Bioinformatic Algorithms

A special issue of Biology (ISSN 2079-7737).

Deadline for manuscript submissions: closed (31 July 2013) | Viewed by 58863

Special Issue Editor

Genome Informatics, Institute of Human Genetics, Faculty of Medicine, University of Duisburg-Essen, Germany
Interests: algorithmic bioinformatics; genome informatics; computational biology; next generation sequencing; resource-constrained data analysis; computational diagnostics and prognostics; biomarker discovery; network biology

Special Issue Information

Dear Colleagues,

Questions from molecular biology and genome research have inspired algorithm development in computer science for many years, starting with string algorithms for (small-scale) sequence analysis, and quickly extending to methods from graph and network algorithms, combinatorial optimization, computational statistics and image analysis, to name a few.
Today, new challenges arise from what is called the ``data deluge'' in biology: In some areas, such as next generation sequencing or bioimaging, the ability to produce data has grown much more rapidly than the ability to analyze and interpret the data sets. This situation calls for applying novel ideas from algorithm development to the datasets that each biology lab faces today.
This special issue calls for papers that illustrate this process, starting with a clearly motivated biological question, translating it into a formal problem, applying an appropriate algorithmic technique, and finally applying the solution to the original problem.
Relevant ideas from algorithmics include, but are not limited to, sublinear algorithms, streaming algorithms, compressed data structures, sampling techniques, or parallelization techniques.
Each submission should make an effort to present the algorithmic ideas first separately from their application. One goal of this special issue is to provide a broad overview of recent algorithmic developments in bioinformatics.

Prof. Dr. Sven Rahmann
Guest editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Biology is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data deluge
  • algorithm development
  • algorithm engineering

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

664 KiB  
Article
Local Similarity Search to Find Gene Indicators in Mitochondrial Genomes
by Ruby L. V. Moritz, Matthias Bernt and Martin Middendorf
Biology 2014, 3(1), 220-242; https://doi.org/10.3390/biology3010220 - 11 Mar 2014
Cited by 66 | Viewed by 6817
Abstract
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given [...] Read more.
Given a set of nucleotide sequences we consider the problem of identifying conserved substrings occurring in homologous genes in a large number of sequences. The problem is solved by identifying certain nodes in a suffix tree containing all substrings occurring in the given nucleotide sequences. Due to the large size of the targeted data set, our approach employs a truncated version of suffix trees. Two methods for this task are introduced: (1) The annotation guided marker detection method uses gene annotations which might contain a moderate number of errors; (2) The probability based marker detection method determines sequences that appear significantly more often than expected. The approach is successfully applied to the mitochondrial nucleotide sequences, and the corresponding annotations that are available in RefSeq for 2989 metazoan species. We demonstrate that the approach finds appropriate substrings. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

2597 KiB  
Article
Portraying the Expression Landscapes of B-CellLymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes
by Lydia Hopp, Kathrin Lembcke, Hans Binder and Henry Wirth
Biology 2013, 2(4), 1411-1437; https://doi.org/10.3390/biology2041411 - 02 Dec 2013
Cited by 15 | Viewed by 7490
Abstract
We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. [...] Read more.
We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

Graphical abstract

649 KiB  
Article
Dynamic Programming Used to Align Protein Structures with a Spectrum Is Robust
by Allen Holder, Jacqueline Simon, Jonathon Strauser, Jonathan Taylor and Yosi Shibberu
Biology 2013, 2(4), 1296-1310; https://doi.org/10.3390/biology2041296 - 20 Nov 2013
Cited by 21 | Viewed by 6473
Abstract
Several efficient algorithms to conduct pairwise comparisons among large databases of protein structures have emerged in the recent literature. The central theme is the design of a measure between the Cα atoms of two protein chains, from which dynamic programming is used [...] Read more.
Several efficient algorithms to conduct pairwise comparisons among large databases of protein structures have emerged in the recent literature. The central theme is the design of a measure between the Cα atoms of two protein chains, from which dynamic programming is used to compute an alignment. The efficiency and efficacy of these algorithms allows large-scale computational studies that would have been previously impractical. The computational study herein shows that the structural alignment algorithm eigen-decomposition alignment with the spectrum (EIGAs) is robust against both parametric and structural variation. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

344 KiB  
Article
Algorithms for Hidden Markov Models Restricted to Occurrences of Regular Expressions
by Paula Tataru, Andreas Sand, Asger Hobolth, Thomas Mailund and Christian N. S. Pedersen
Biology 2013, 2(4), 1282-1295; https://doi.org/10.3390/biology2041282 - 08 Nov 2013
Cited by 27 | Viewed by 7672
Abstract
Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed [...] Read more.
Hidden Markov Models (HMMs) are widely used probabilistic models, particularly for annotating sequential data with an underlying hidden structure. Patterns in the annotation are often more relevant to study than the hidden structure itself. A typical HMM analysis consists of annotating the observed data using a decoding algorithm and analyzing the annotation to study patterns of interest. For example, given an HMM modeling genes in DNA sequences, the focus is on occurrences of genes in the annotation. In this paper, we define a pattern through a regular expression and present a restriction of three classical algorithms to take the number of occurrences of the pattern in the hidden sequence into account. We present a new algorithm to compute the distribution of the number of pattern occurrences, and we extend the two most widely used existing decoding algorithms to employ information from this distribution. We show experimentally that the expectation of the distribution of the number of pattern occurrences gives a highly accurate estimate, while the typical procedure can be biased in the sense that the identified number of pattern occurrences does not correspond to the true number. We furthermore show that using this distribution in the decoding algorithms improves the predictive power of the model. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

Review

Jump to: Research

590 KiB  
Review
R-Based Software for the Integration of Pathway Data into Bioinformatic Algorithms
by Frank Kramer, Michaela Bayerlová and Tim Beißbarth
Biology 2014, 3(1), 85-100; https://doi.org/10.3390/biology3010085 - 07 Feb 2014
Cited by 46 | Viewed by 10748
Abstract
Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, [...] Read more.
Putting new findings into the context of available literature knowledge is one approach to deal with the surge of high-throughput data results. Furthermore, prior knowledge can increase the performance and stability of bioinformatic algorithms, for example, methods for network reconstruction. In this review, we examine software packages for the statistical computing framework R, which enable the integration of pathway data for further bioinformatic analyses. Different approaches to integrate and visualize pathway data are identified and packages are stratified concerning their features according to a number of different aspects: data import strategies, the extent of available data, dependencies on external tools, integration with further analysis steps and visualization options are considered. A total of 12 packages integrating pathway data are reviewed in this manuscript. These are supplemented by five R-specific packages for visualization and six connector packages, which provide access to external tools. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

Graphical abstract

361 KiB  
Review
Algorithmic Perspectives of Network Transitive Reduction Problems and their Applications to Synthesis and Analysis of Biological Networks
by Satabdi Aditya, Bhaskar DasGupta and Marek Karpinski
Biology 2014, 3(1), 1-21; https://doi.org/10.3390/biology3010001 - 19 Dec 2013
Cited by 134 | Viewed by 6965
Abstract
In this survey paper, we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a [...] Read more.
In this survey paper, we will present a number of core algorithmic questions concerning several transitive reduction problems on network that have applications in network synthesis and analysis involving cellular processes. Our starting point will be the so-called minimum equivalent digraph problem, a classic computational problem in combinatorial algorithms. We will subsequently consider a few non-trivial extensions or generalizations of this problem motivated by applications in systems biology. We will then discuss the applications of these algorithmic methodologies in the context of three major biological research questions: synthesizing and simplifying signal transduction networks, analyzing disease networks, and measuring redundancy of biological networks. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

Figure 1

1353 KiB  
Review
Algorithms for Computing the Triplet and Quartet Distances for Binary and General Trees
by Andreas Sand, Morten K. Holt, Jens Johansen, Rolf Fagerberg, Gerth Stølting Brodal, Christian N. S. Pedersen and Thomas Mailund
Biology 2013, 2(4), 1189-1209; https://doi.org/10.3390/biology2041189 - 26 Sep 2013
Cited by 13 | Viewed by 12049
Abstract
Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four [...] Read more.
Distance measures between trees are useful for comparing trees in a systematic manner, and several different distance measures have been proposed. The triplet and quartet distances, for rooted and unrooted trees, respectively, are defined as the number of subsets of three or four leaves, respectively, where the topologies of the induced subtrees differ. These distances can trivially be computed by explicitly enumerating all sets of three or four leaves and testing if the topologies are different, but this leads to time complexities at least of the order n3 or n4 just for enumerating the sets. The different topologies can be counte dimplicitly, however, and in this paper, we review a series of algorithmic improvements that have been used during the last decade to develop more efficient algorithms by exploiting two different strategies for this; one based on dynamic programming and another based oncoloring leaves in one tree and updating a hierarchical decomposition of the other. Full article
(This article belongs to the Special Issue Developments in Bioinformatic Algorithms)
Show Figures

Figure 1

Back to TopTop