Special Issue "Genomes and Evolution: Computational Approaches"

Quicklinks

A special issue of Computation (ISSN 2079-3197). This special issue belongs to the section "Computational Biology".

Deadline for manuscript submissions: closed (31 October 2014)

Special Issue Editors

Guest Editor
Prof. Dr. Rainer Breitling

Manchester Institute of Biotechnology, University of Manchester, 131 Princess Street, Manchester, M1 7DN, UK
Website | E-Mail
Interests: computational systems biology; bioinformatics; metabolomics; dynamic modelling; synthetic biology
Guest Editor
Dr. Marnix Medema

Bioinformatics Group, Wageningen University, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
E-Mail
Phone: +31 317 484 706

Special Issue Information

Dear Colleagues,

The computational analysis of gene and genome sequences has become a key methodology for understanding the function and evolution of biological systems. Often, descriptions of specific computational methods that have led to exciting research results are discussed only briefly, or relegated to the supplementary information of the papers describing them. Yet, many of these methods merit a more thorough discussion of the key concepts on which they are based, and of the possible further opportunities for exploiting these methods in other contexts. This Special Issue aims to offer a platform for explaining, discussing and contextualizing important computational methods and algorithms. Such methods can assist other scientists researching the evolutionary history of gene and genome sequences and such genes’ biological functions.

Specific topics include, but are not limited to:

  • Methods for tracing the evolutionary history of genome sequences, including, for example, the dynamics of introns and transposons, as well as duplication, recombination, and horizontal transfer events
  • Methods for improving (meta)genome assembly by employing evolutionary information
  • Phylogenetic methods for evaluating evolutionary relationships between genes and genomes
  • Algorithms for studying patterns in amino acid sequences and/or protein structure evolution
  • Tools for automating the annotation of genomes or genomic regions according to function
  • Algorithms or pipelines for identifying mutations from high-throughput sequencing experiments
  • Pipelines for evaluating the outcome of next-generation sequence assemblies
  • Methods for evaluating the evolutionary similarity of genes, gene clusters, genomes, pan-genomes or metagenomes
  • Models and tools for simulating, predicting or otherwise evaluating the evolution of genome-based metabolic or regulatory networks from a systems biology perspective

Prof. Dr. Rainer Breitling
Dr. Marnix Medema
Guest Editors

Submission

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. Papers will be published continuously (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are refereed through a peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Computation is an international peer-reviewed Open Access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. For the first couple of issues the Article Processing Charge (APC) will be waived for well-prepared manuscripts. English correction and/or formatting fees of 250 CHF (Swiss Francs) will be charged in certain cases for those articles accepted for publication that require extensive additional formatting and/or English corrections.


Keywords

  • bioinformatics
  • computational biology
  • evolution
  • systems biology
  • algorithms
  • comparative genomics
  • phylogeny
  • sequence analysis
  • metagenomics

Published Papers (8 papers)

View options order results:
result details:
Displaying articles 1-8
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle Computational Recognition of RNA Splice Sites by Exact Algorithms for the Quadratic Traveling Salesman Problem
Computation 2015, 3(2), 285-298; doi:10.3390/computation3020285
Received: 3 December 2014 / Revised: 16 April 2015 / Accepted: 12 May 2015 / Published: 3 June 2015
PDF Full-text (249 KB) | HTML Full-text | XML Full-text
Abstract
One fundamental problem of bioinformatics is the computational recognition of DNA and RNA binding sites. Given a set of short DNA or RNA sequences of equal length such as transcription factor binding sites or RNA splice sites, the task is to learn a
[...] Read more.
One fundamental problem of bioinformatics is the computational recognition of DNA and RNA binding sites. Given a set of short DNA or RNA sequences of equal length such as transcription factor binding sites or RNA splice sites, the task is to learn a pattern from this set that allows the recognition of similar sites in another set of DNA or RNA sequences. Permuted Markov (PM) models and permuted variable length Markov (PVLM) models are two powerful models for this task, but the problem of finding an optimal PM model or PVLM model is NP-hard. While the problem of finding an optimal PM model or PVLM model of order one is equivalent to the traveling salesman problem (TSP), the problem of finding an optimal PM model or PVLM model of order two is equivalent to the quadratic TSP (QTSP). Several exact algorithms exist for solving the QTSP, but it is unclear if these algorithms are capable of solving QTSP instances resulting from RNA splice sites of at least 150 base pairs in a reasonable time frame. Here, we investigate the performance of three exact algorithms for solving the QTSP for ten datasets of splice acceptor sites and splice donor sites of five different species and find that one of these algorithms is capable of solving QTSP instances of up to 200 base pairs with a running time of less than two days. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)
Open AccessArticle A Guide to Phylogenetic Reconstruction Using Heterogeneous Models—A Case Study from the Root of the Placental Mammal Tree
Computation 2015, 3(2), 177-196; doi:10.3390/computation3020177
Received: 26 December 2014 / Revised: 30 March 2015 / Accepted: 31 March 2015 / Published: 15 April 2015
Cited by 2 | PDF Full-text (1360 KB) | HTML Full-text | XML Full-text
Abstract
There are numerous phylogenetic reconstruction methods and models available—but which should you use and why? Important considerations in phylogenetic analyses include data quality, structure, signal, alignment length and sampling. If poorly modelled, variation in rates of change across proteins and across lineages can
[...] Read more.
There are numerous phylogenetic reconstruction methods and models available—but which should you use and why? Important considerations in phylogenetic analyses include data quality, structure, signal, alignment length and sampling. If poorly modelled, variation in rates of change across proteins and across lineages can lead to incorrect phylogeny reconstruction which can then lead to downstream misinterpretation of the underlying data. The risk of choosing and applying an inappropriate model can be reduced with some critical yet straightforward steps outlined in this paper. We use the question of the position of the root of placental mammals as our working example to illustrate the topological impact of model misspecification. Using this case study we focus on using models in a Bayesian framework and we outline the steps involved in identifying and assessing better fitting models for specific datasets. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)
Open AccessArticle Evolution by Pervasive Gene Fusion in Antibiotic Resistance and Antibiotic Synthesizing Genes
Computation 2015, 3(2), 114-127; doi:10.3390/computation3020114
Received: 8 September 2014 / Revised: 10 March 2015 / Accepted: 11 March 2015 / Published: 26 March 2015
Cited by 1 | PDF Full-text (4002 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Phylogenetic (tree-based) approaches to understanding evolutionary history are unable to incorporate convergent evolutionary events where two genes merge into one. In this study, as exemplars of what can be achieved when a tree is not assumed a priori, we have analysed the evolutionary
[...] Read more.
Phylogenetic (tree-based) approaches to understanding evolutionary history are unable to incorporate convergent evolutionary events where two genes merge into one. In this study, as exemplars of what can be achieved when a tree is not assumed a priori, we have analysed the evolutionary histories of polyketide synthase genes and antibiotic resistance genes and have shown that their history is replete with convergent events as well as divergent events. We demonstrate that the overall histories of these genes more closely resembles the remodelling that might be seen with the children’s toy Lego, than the standard model of the phylogenetic tree. This work demonstrates further that genes can act as public goods, available for re-use and incorporation into other genetic goods. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)
Open AccessArticle Computational and Statistical Analyses of Insertional Polymorphic Endogenous Retroviruses in a Non-Model Organism
Computation 2014, 2(4), 221-245; doi:10.3390/computation2040221
Received: 5 August 2014 / Revised: 20 October 2014 / Accepted: 24 October 2014 / Published: 28 November 2014
Cited by 1 | PDF Full-text (2430 KB) | HTML Full-text | XML Full-text
Abstract
Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part
[...] Read more.
Endogenous retroviruses (ERVs) are a class of transposable elements found in all vertebrate genomes that contribute substantially to genomic functional and structural diversity. A host species acquires an ERV when an exogenous retrovirus infects a germ cell of an individual and becomes part of the genome inherited by viable progeny. ERVs that colonized ancestral lineages are fixed in contemporary species. However, in some extant species, ERV colonization is ongoing, which results in variation in ERV frequency in the population. To study the consequences of ERV colonization of a host genome, methods are needed to assign each ERV to a location in a species’ genome and determine which individuals have acquired each ERV by descent. Because well annotated reference genomes are not widely available for all species, de novo clustering approaches provide an alternative to reference mapping that are insensitive to differences between query and reference and that are amenable to mobile element studies in both model and non-model organisms. However, there is substantial uncertainty in both identifying ERV genomic position and assigning each unique ERV integration site to individuals in a population. We present an analysis suitable for detecting ERV integration sites in species without the need for a reference genome. Our approach is based on improved de novo clustering methods and statistical models that take the uncertainty of assignment into account and yield a probability matrix of shared ERV integration sites among individuals. We demonstrate that polymorphic integrations of a recently identified endogenous retrovirus in deer reflect contemporary relationships among individuals and populations. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)
Open AccessArticle Incongruencies in Vaccinia Virus Phylogenetic Trees
Computation 2014, 2(4), 182-198; doi:10.3390/computation2040182
Received: 30 June 2014 / Revised: 22 September 2014 / Accepted: 29 September 2014 / Published: 14 October 2014
PDF Full-text (835 KB) | HTML Full-text | XML Full-text
Abstract
Over the years, as more complete poxvirus genomes have been sequenced, phylogenetic studies of these viruses have become more prevalent. In general, the results show similar relationships between the poxvirus species; however, some inconsistencies are notable. Previous analyses of the viral genomes contained
[...] Read more.
Over the years, as more complete poxvirus genomes have been sequenced, phylogenetic studies of these viruses have become more prevalent. In general, the results show similar relationships between the poxvirus species; however, some inconsistencies are notable. Previous analyses of the viral genomes contained within the vaccinia virus (VACV)-Dryvax vaccine revealed that their phylogenetic relationships were sometimes clouded by low bootstrapping confidence. To analyze the VACV-Dryvax genomes in detail, a new tool-set was developed and integrated into the Base-By-Base bioinformatics software package. Analyses showed that fewer unique positions were present in each VACV-Dryvax genome than expected. A series of patterns, each containing several single nucleotide polymorphisms (SNPs) were identified that were counter to the results of the phylogenetic analysis. The VACV genomes were found to contain short DNA sequence blocks that matched more distantly related clades. Additionally, similar non-conforming SNP patterns were observed in (1) the variola virus clade; (2) some cowpox clades; and (3) VACV-CVA, the direct ancestor of VACV-MVA. Thus, traces of past recombination events are common in the various orthopoxvirus clades, including those associated with smallpox and cowpox viruses. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)
Open AccessArticle On Mechanistic Modeling of Gene Content Evolution: Birth-Death Models and Mechanisms of Gene Birth and Gene Retention
Computation 2014, 2(3), 112-130; doi:10.3390/computation2030112
Received: 6 May 2014 / Revised: 29 July 2014 / Accepted: 14 August 2014 / Published: 28 August 2014
Cited by 4 | PDF Full-text (959 KB) | HTML Full-text | XML Full-text
Abstract
Characterizing the mechanisms of duplicate gene retention using phylogenetic methods requires models that are consistent with different biological processes. The interplay between complex biological processes and necessarily simpler statistical models leads to a complex modeling problem. A discussion of the relationship between biological
[...] Read more.
Characterizing the mechanisms of duplicate gene retention using phylogenetic methods requires models that are consistent with different biological processes. The interplay between complex biological processes and necessarily simpler statistical models leads to a complex modeling problem. A discussion of the relationship between biological processes, existing models for duplicate gene retention and data is presented. Existing models are then extended in deriving two new birth/death models for phylogenetic application in a gene tree/species tree reconciliation framework to enable probabilistic inference of the mechanisms from model parameterization. The goal of this work is to synthesize a detailed discussion of modeling duplicate genes to address biological questions, moving from previous work to future trajectories with the aim of generating better models and better inference. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

Review

Jump to: Research

Open AccessReview Evolutionary Dynamics in Gene Networks and Inference Algorithms
Computation 2015, 3(1), 99-113; doi:10.3390/computation3010099
Received: 31 October 2014 / Revised: 18 February 2015 / Accepted: 3 March 2015 / Published: 13 March 2015
PDF Full-text (841 KB) | HTML Full-text | XML Full-text
Abstract
Dynamical interactions among sets of genes (and their products) regulate developmental processes and some dynamical diseases, like cancer. Gene regulatory networks (GRNs) are directed networks that define interactions (links) among different genes/proteins involved in such processes. Genetic regulation can be modified during the
[...] Read more.
Dynamical interactions among sets of genes (and their products) regulate developmental processes and some dynamical diseases, like cancer. Gene regulatory networks (GRNs) are directed networks that define interactions (links) among different genes/proteins involved in such processes. Genetic regulation can be modified during the time course of the process, which may imply changes in the nodes activity that leads the system from a specific state to a different one at a later time (dynamics). How the GRN modifies its topology, to properly drive a developmental process, and how this regulation was acquired across evolution are questions that the evolutionary dynamics of gene networks tackles. In the present work we review important methodology in the field and highlight the combination of these methods with evolutionary algorithms. In recent years, this combination has become a powerful tool to fit models with the increasingly available experimental data. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)
Figures

Open AccessReview Computation of the Likelihood in Biallelic Diffusion Models Using Orthogonal Polynomials
Computation 2014, 2(4), 199-220; doi:10.3390/computation2040199
Received: 14 July 2014 / Revised: 12 October 2014 / Accepted: 16 October 2014 / Published: 14 November 2014
Cited by 2 | PDF Full-text (267 KB) | HTML Full-text | XML Full-text
Abstract
In population genetics, parameters describing forces such as mutation, migration and drift are generally inferred from molecular data. Lately, approximate methods based on simulations and summary statistics have been widely applied for such inference, even though these methods waste information. In contrast, probabilistic
[...] Read more.
In population genetics, parameters describing forces such as mutation, migration and drift are generally inferred from molecular data. Lately, approximate methods based on simulations and summary statistics have been widely applied for such inference, even though these methods waste information. In contrast, probabilistic methods of inference can be shown to be optimal, if their assumptions are met. In genomic regions where recombination rates are high relative to mutation rates, polymorphic nucleotide sites can be assumed to evolve independently from each other. The distribution of allele frequencies at a large number of such sites has been called “allele-frequency spectrum” or “site-frequency spectrum” (SFS). Conditional on the allelic proportions, the likelihoods of such data can be modeled as binomial. A simple model representing the evolution of allelic proportions is the biallelic mutation-drift or mutation-directional selection-drift diffusion model. With series of orthogonal polynomials, specifically Jacobi and Gegenbauer polynomials, or the related spheroidal wave function, the diffusion equations can be solved efficiently. In the neutral case, the product of the binomial likelihoods with the sum of such polynomials leads to finite series of polynomials, i.e., relatively simple equations, from which the exact likelihoods can be calculated. In this article, the use of orthogonal polynomials for inferring population genetic parameters is investigated. Full article
(This article belongs to the Special Issue Genomes and Evolution: Computational Approaches)

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Authors: Héctor Romero et al.
Title: Operon conservation measures
Abstract: Operons are well known genetic structures prevalent in Bacteria and Archaea. Genes that encode proteins sharing a metabolic pathway, composing the same molecular ensemble, or being nodes in a certain regulation network, are usually organized in operons. Despite there is plenty of data, how operons appear, evolve and die is still matter of hot debate. We developed some methods to measure operon conservation between organisms based on the rather straightforward idea of comparing operon organization of orthologous genes in two different organisms. A given operon organization can be seen as a partition of the gene complement, we used the tools of comparing partitions to asses operon conservation. We then test these measures with different genetic distances, genome sizes, using the complete gene complement, core genes or accessory genes.


Title: Computation of the likelihood in bi-allelic diffusion models using orthogonal polynomials
Authors: Claus Vogl et al.
Abstract: In population genetics, parameters describing forces such as mutation, migration, and drift are generally inferred from molecular data. Lately, methods based on simulations and summary statistics have been widely applied for such inference, even though these methods are only approximate and thus waste information. In contrast, probabilistic methods of inference can be shown to be optimal, if their assumptions are met. In genomic regions where recombination rates are high relative to mutation rates, polymorphic nucleotide sites can be assumed to evolve independently from each other. The distribution of allele frequencies at a large number of such sites has been called ``allele-frequency spectrum'' or ``site-frequency spectrum'' (SFS). Conditionally on the allelic proportions, the likelihoods of such data are binomial. A simple model representing the evolution of allelic proportions is the bi-allelic mutation-drift or mutation-migration-drift diffusion model. With infinite series of orthogonal polynomials, specifically Jacobi and Gegenbauer polynomials, the diffusion equations can be solved by efficiently and flexibly, even in non-equilibrium situations. The product of the binomial likelihoods with the sum of such polynomials leads to finite series of polynomials, i.e., relatively simple equations for the exact likelihoods. In this article, I investigate the use of orthogonal polynomials in the inference of population genetic parameters.

Authors: Michael T. Fluhler, Esq. and Dennis S. Fernandez, Esq.
Title: Intellectual Property Strategies for Genomics, Bioinformatics, and Computational Intelligence
Abstract: Bioinformatics and computational biology, especially in the context of genomics, are ever-growing fields that require the direct application of computational intelligence. The worldwide intellectual property (IP) ecosystem continuously evolves, especially with the recent reformation of the American patent system, and thus IP rights and strategies continue to be increasingly vital in these fields. In order to better understand the status quo of IP specifically in the fields of biology that apply computational intelligence, basic IP definitions, recent IP developments, and advanced protection, enforcement, and monetization strategies are discussed.

Journal Contact

MDPI AG
Computation Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
computation@mdpi.com
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Computation
Back to Top