Special Issue "Next Generation Sequencing Approaches in Biology"

Quicklinks

A special issue of Biology (ISSN 2079-7737).

Deadline for manuscript submissions: closed (31 July 2012)

Special Issue Editor

Guest Editor
Prof. Dr. Mario Stanke

University of Greifswald, Institute for Mathematics and Computer Science, 17487 Greifswald, Germany
Website | E-Mail

Special Issue Information

Dear Colleagues,

New fast and cheap DNA sequencing technologies from an increasing number of different platforms have forwarded biological research applications such as the sequencing of new genomes, the resequencing of individuals' genomes and the analysis of gene expression. While only in the last decade a single genome sequencing project used to keep a large consortium busy for years, now many sequencing projects target dozens, hundreds or even thousands of genomes in a single undertaking. The new technology also promises to make medicine more individual and enhance the understanding of the genotype's influence on diseases and traits. However, next generation sequencing technologies also bring new challenges, such as the requirement of more efficient mapping and assembly algorithms, the necessity to deal with shorter reads or with reads that have a higher error rate. Other challenges in distributed computing arise from the much larger data sizes. This special issue will cover original research papers on bioinformatical and statistical methods that make use of next generation sequencing. The submission of reviews is also welcomed.

Prof. Dr. Mario Stanke
Guest Editor

Keywords

  • RNA-seq
  • assembly
  • expression analysis
  • mapping
  • alignment
  • gene finding
  • SNPs
  • genome-wide association
  • copy number variations
  • cloud computing
  • ChIP-seq

Published Papers (12 papers)

View options order results:
result details:
Displaying articles 1-12
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle FLEXBAR—Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms
Biology 2012, 1(3), 895-905; doi:10.3390/biology1030895
Received: 28 August 2012 / Revised: 21 September 2012 / Accepted: 7 December 2012 / Published: 14 December 2012
Cited by 66 | PDF Full-text (261 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a) specific to the sequencing platform and library construction method (e.g., adapter
[...] Read more.
Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a) specific to the sequencing platform and library construction method (e.g., adapter sequences); (b) have been introduced by experimental design (e.g., sample barcodes); or (c) constitute some biological signal (e.g., splice leader sequences in nematodes). Our software FLEXBAR enables accurate recognition, sorting and trimming of sequence tags with maximal flexibility, based on exact overlap sequence alignment. The software supports data formats from all current sequencing platforms, including color-space reads. FLEXBAR maintains read pairings and processes separate barcode reads on demand. Our software facilitates the fine-grained adjustment of sequence tag detection parameters and search regions. FLEXBAR is a multi-threaded software and combines speed with precision. Even complex read processing scenarios might be executed with a single command line call. We demonstrate the utility of the software in terms of read mapping applications, library demultiplexing and splice leader detection. FLEXBAR and additional information is available for academic use from the website: http://sourceforge.net/projects/flexbar/. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessArticle Whole Genome Sequencing and a New Bioinformatics Platform Allow for Rapid Gene Identification in D. melanogaster EMS Screens
Biology 2012, 1(3), 766-777; doi:10.3390/biology1030766
Received: 5 October 2012 / Revised: 14 November 2012 / Accepted: 20 November 2012 / Published: 5 December 2012
Cited by 4 | PDF Full-text (414 KB) | HTML Full-text | XML Full-text
Abstract
Forward genetic screens in Drosophila melanogaster using ethyl methanesulfonate (EMS) mutagenesis are a powerful approach for identifying genes that modulate specific biological processes in an in vivo setting. The mapping of genes that contain randomly-induced point mutations has become more efficient in Drosophila
[...] Read more.
Forward genetic screens in Drosophila melanogaster using ethyl methanesulfonate (EMS) mutagenesis are a powerful approach for identifying genes that modulate specific biological processes in an in vivo setting. The mapping of genes that contain randomly-induced point mutations has become more efficient in Drosophila thanks to the maturation and availability of many types of genetic tools. However, classic approaches to gene mapping are relatively slow and ultimately require extensive Sanger sequencing of candidate chromosomal loci. With the advent of new high-throughput sequencing techniques, it is increasingly efficient to directly re-sequence the whole genome of model organisms. This approach, in combination with traditional chromosomal mapping, has the potential to greatly simplify and accelerate mutation identification in mutants generated in EMS screens. Here we show that next-generation sequencing (NGS) is an accurate and efficient tool for high-throughput sequencing and mutation discovery in Drosophila melanogaster. As a test case, mutant strains of Drosophila that exhibited long-term survival of severed peripheral axons were identified in a forward EMS mutagenesis. All mutants were recessive and fell into a single lethal complementation group, which suggested that a single gene was responsible for the protective axon degenerative phenotype. Whole genome sequencing of these genomes identified the underlying gene ect4. To improve the process of genome wide mutation identification, we developed Genomes Management Application (GEM.app, https://genomics.med.miami.edu), a graphical online user interface to a custom query framework. Using a custom GEM.app query, we were able to identify that each mutant carried a unique non-sense mutation in the gene ect4 (dSarm), which was recently shown by Osterloh et al. to be essential for the activation of axonal degeneration. Our results demonstrate the current advantages and limitations of NGS in Drosophila and we introduce GEM.app as a simple yet powerful genomics analysis tool for the Drosophila community. At a current cost of <$1,000 per genome, NGS should thus become a standard gene discovery tool in EMS induced genetic forward screens. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessArticle The Population Genomics of Sunflowers and Genomic Determinants of Protein Evolution Revealed by RNAseq
Biology 2012, 1(3), 575-596; doi:10.3390/biology1030575
Received: 30 August 2012 / Revised: 9 October 2012 / Accepted: 12 October 2012 / Published: 25 October 2012
Cited by 13 | PDF Full-text (911 KB) | HTML Full-text | XML Full-text
Abstract
Few studies have investigated the causes of evolutionary rate variation among plant nuclear genes, especially in recently diverged species still capable of hybridizing in the wild. The recent advent of Next Generation Sequencing (NGS) permits investigation of genome wide rates of protein evolution
[...] Read more.
Few studies have investigated the causes of evolutionary rate variation among plant nuclear genes, especially in recently diverged species still capable of hybridizing in the wild. The recent advent of Next Generation Sequencing (NGS) permits investigation of genome wide rates of protein evolution and the role of selection in generating and maintaining divergence. Here, we use individual whole-transcriptome sequencing (RNAseq) to refine our understanding of the population genomics of wild species of sunflowers (Helianthus spp.) and the factors that affect rates of protein evolution. We aligned 35 GB of transcriptome sequencing data and identified 433,257 polymorphic sites (SNPs) in a reference transcriptome comprising 16,312 genes. Using SNP markers, we identified strong population clustering largely corresponding to the three species analyzed here (Helianthus annuus, H. petiolaris, H. debilis), with one distinct early generation hybrid. Then, we calculated the proportions of adaptive substitution fixed by selection (alpha) and identified gene ontology categories with elevated values of alpha. The “response to biotic stimulus” category had the highest mean alpha across the three interspecific comparisons, implying that natural selection imposed by other organisms plays an important role in driving protein evolution in wild sunflowers. Finally, we examined the relationship between protein evolution (dN/dS ratio) and several genomic factors predicted to co-vary with protein evolution (gene expression level, divergence and specificity, genetic divergence [FST], and nucleotide diversity pi). We find that variation in rates of protein divergence was correlated with gene expression level and specificity, consistent with results from a broad range of taxa and timescales. This would in turn imply that these factors govern protein evolution both at a microevolutionary and macroevolutionary timescale. Our results contribute to a general understanding of the determinants of rates of protein evolution and the impact of selection on patterns of polymorphism and divergence. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessArticle TE-Locate: A Tool to Locate and Group Transposable Element Occurrences Using Paired-End Next-Generation Sequencing Data
Biology 2012, 1(2), 395-410; doi:10.3390/biology1020395
Received: 27 July 2012 / Revised: 22 August 2012 / Accepted: 31 August 2012 / Published: 12 September 2012
Cited by 8 | PDF Full-text (936 KB) | HTML Full-text | XML Full-text | Supplementary Files
Abstract
Transposable elements (TEs) are common mobile DNA elements present in nearly all genomes. Since the movement of TEs within a genome can sometimes have phenotypic consequences, an accurate report of TE actions is desirable. To this end, we developed TE-Locate, a computational tool
[...] Read more.
Transposable elements (TEs) are common mobile DNA elements present in nearly all genomes. Since the movement of TEs within a genome can sometimes have phenotypic consequences, an accurate report of TE actions is desirable. To this end, we developed TE-Locate, a computational tool that uses paired-end reads to identify the novel locations of known TEs. TE-Locate can utilize either a database of TE sequences, or annotated TEs within the reference sequence of interest. This makes TE-Locate useful in the search for any mobile sequence, including retrotransposed gene copies. One major concern is to act on the correct hierarchy level, thereby avoiding an incorrect calling of a single insertion as multiple events of TEs with high sequence similarity. We used the (super)family level, but TE-Locate can also use any other level, right down to the individual transposable element. As an example of analysis with TE-Locate, we used the Swedish population in the 1,001 Arabidopsis genomes project, and presented the biological insights gained from the novel TEs, inducing the association between different TE superfamilies. The program is freely available, and the URL is provided in the end of the paper. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessArticle Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP
Biology 2012, 1(2), 370-382; doi:10.3390/biology1020370
Received: 12 July 2012 / Revised: 9 August 2012 / Accepted: 10 August 2012 / Published: 27 August 2012
Cited by 24 | PDF Full-text (151 KB) | HTML Full-text | XML Full-text
Abstract
Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a
[...] Read more.
Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a challenge for large and polyploid genomes due to their size and complexity. We have developed a pipeline for the robust identification of SNPs in large and complex genomes using Illumina second generation DNA sequence data and demonstrated this by the discovery of SNPs in the hexaploid wheat genome. We have developed a SNP discovery pipeline called SGSautoSNP (Second-Generation Sequencing AutoSNP) and applied this to discover more than 800,000 SNPs between four hexaploid wheat cultivars across chromosomes 7A, 7B and 7D. All SNPs are presented for download and viewing within a public GBrowse database. Validation suggests an accuracy of greater than 93% of SNPs represent polymorphisms between wheat cultivars and hence are valuable for detailed diversity analysis, marker assisted selection and genotyping by sequencing. The pipeline produces output in GFF3, VCF, Flapjack or Illumina Infinium design format for further genotyping diverse populations. As well as providing an unprecedented resource for wheat diversity analysis, the method establishes a foundation for high resolution SNP discovery in other large and complex genomes. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

Review

Jump to: Research

Open AccessReview Next-Generation Sequencing: From Understanding Biology to Personalized Medicine
Biology 2013, 2(1), 378-398; doi:10.3390/biology2010378
Received: 21 January 2013 / Revised: 21 January 2013 / Accepted: 4 February 2013 / Published: 1 March 2013
Cited by 8 | PDF Full-text (277 KB) | HTML Full-text | XML Full-text
Abstract
Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and
[...] Read more.
Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessReview Methods, Challenges and Potentials of Single Cell RNA-seq
Biology 2012, 1(3), 658-667; doi:10.3390/biology1030658
Received: 24 October 2012 / Revised: 24 October 2012 / Accepted: 7 November 2012 / Published: 16 November 2012
Cited by 12 | PDF Full-text (221 KB) | HTML Full-text | XML Full-text
Abstract
RNA-sequencing (RNA-seq) has become the tool of choice for transcriptomics. Several recent studies demonstrate its successful adaption to single cell analysis. This allows new biological insights into cell differentiation, cell-to-cell variation and gene regulation, and how these aspects depend on each other. Here,
[...] Read more.
RNA-sequencing (RNA-seq) has become the tool of choice for transcriptomics. Several recent studies demonstrate its successful adaption to single cell analysis. This allows new biological insights into cell differentiation, cell-to-cell variation and gene regulation, and how these aspects depend on each other. Here, I review the current single cell RNA-seq (scRNA-seq) efforts and discuss experimental protocols, challenges and potentials. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessReview Genome Walking by Next Generation Sequencing Approaches
Biology 2012, 1(3), 495-507; doi:10.3390/biology1030495
Received: 1 August 2012 / Revised: 31 August 2012 / Accepted: 25 September 2012 / Published: 1 October 2012
Cited by 2 | PDF Full-text (274 KB) | HTML Full-text | XML Full-text
Abstract
Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion
[...] Read more.
Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion sites for viruses and transposons. In the latter cases Genome Walking methods have been recently boosted by coupling to Next Generation Sequencing technologies. This review will focus on the development of several protocols for the application of Next Generation Sequencing (NGS) technologies to GW, which have been developed in the course of analysis of insertional libraries. These analyses find broad application in protocols for functional genomics and gene therapy. Thanks to the application of NGS technologies, the original vision of GW as a procedure for walking along an unknown genome is now changing into the possibility of observing the parallel marching of hundreds of thousands of primers across the borders of inserted DNA molecules in host genomes. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Figures

Open AccessReview Genotyping-by-Sequencing in Plants
Biology 2012, 1(3), 460-483; doi:10.3390/biology1030460
Received: 6 August 2012 / Revised: 7 August 2012 / Accepted: 13 September 2012 / Published: 25 September 2012
Cited by 43 | PDF Full-text (358 KB) | HTML Full-text | XML Full-text
Abstract
The advent of next-generation DNA sequencing (NGS) technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP) detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is
[...] Read more.
The advent of next-generation DNA sequencing (NGS) technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP) detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is allowing NGS to be applied to not only the evaluation of small subsets of parental inbred lines, but also the mapping and characterization of traits of interest in much larger populations. Such an approach, where sequences are used simultaneously to detect and score SNPs, therefore bypassing the entire marker assay development stage, is known as genotyping-by-sequencing (GBS). This review will summarize the current state of GBS in plants and the promises it holds as a genome-wide genotyping application. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessReview Why Assembling Plant Genome Sequences Is So Challenging
Biology 2012, 1(2), 439-459; doi:10.3390/biology1020439
Received: 16 July 2012 / Revised: 5 September 2012 / Accepted: 6 September 2012 / Published: 18 September 2012
Cited by 11 | PDF Full-text (203 KB) | HTML Full-text | XML Full-text
Abstract
In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies
[...] Read more.
In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessReview Next-Generation Sequencing: Application in Liver Cancer—Past, Present and Future?
Biology 2012, 1(2), 383-394; doi:10.3390/biology1020383
Received: 25 July 2012 / Revised: 14 August 2012 / Accepted: 20 August 2012 / Published: 31 August 2012
Cited by 5 | PDF Full-text (143 KB) | HTML Full-text | XML Full-text
Abstract
Hepatocellular Carcinoma (HCC) is the third most deadly malignancy worldwide characterized by phenotypic and molecular heterogeneity. In the past two decades, advances in genomic analyses have formed a comprehensive understanding of different underlying pathobiological layers resulting in hepatocarcinogenesis. More recently, improvements of sophisticated
[...] Read more.
Hepatocellular Carcinoma (HCC) is the third most deadly malignancy worldwide characterized by phenotypic and molecular heterogeneity. In the past two decades, advances in genomic analyses have formed a comprehensive understanding of different underlying pathobiological layers resulting in hepatocarcinogenesis. More recently, improvements of sophisticated next-generation sequencing (NGS) technologies have enabled complete and cost-efficient analyses of cancer genomes at a single nucleotide resolution and advanced into valuable tools in translational medicine. Although the use of NGS in human liver cancer is still in its infancy, great promise rests in the systematic integration of different molecular analyses obtained by these methodologies, i.e., genomics, transcriptomics and epigenomics. This strategy is likely to be helpful in identifying relevant and recurrent pathophysiological hallmarks thereby elucidating our limited understanding of liver cancer. Beside tumor heterogeneity, progress in translational oncology is challenged by the amount of biological information and considerable “noise” in the data obtained from different NGS platforms. Nevertheless, the following review aims to provide an overview of the current status of next-generation approaches in liver cancer, and outline the prospects of these technologies in diagnosis, patient classification, and prediction of outcome. Further, the potential of NGS to identify novel applications for concept clinical trials and to accelerate the development of new cancer therapies will be summarized. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)
Open AccessReview Analyzing the microRNA Transcriptome in Plants Using Deep Sequencing Data
Biology 2012, 1(2), 297-310; doi:10.3390/biology1020297
Received: 24 July 2012 / Revised: 3 August 2012 / Accepted: 9 August 2012 / Published: 15 August 2012
PDF Full-text (313 KB) | HTML Full-text | XML Full-text
Abstract
MicroRNAs (miRNAs) are 20- to 24-nucleotide endogenous small RNA molecules emerging as an important class of sequence-specific, trans-acting regulators for modulating gene expression at the post-transcription level. There has been a surge of interest in the past decade in identifying miRNAs and
[...] Read more.
MicroRNAs (miRNAs) are 20- to 24-nucleotide endogenous small RNA molecules emerging as an important class of sequence-specific, trans-acting regulators for modulating gene expression at the post-transcription level. There has been a surge of interest in the past decade in identifying miRNAs and profiling their expression pattern using various experimental approaches. In particular, ultra-deep sampling of specifically prepared low-molecular-weight RNA libraries based on next-generation sequencing technologies has been used successfully in diverse species. The challenge now is to effectively deconvolute the complex sequencing data to provide comprehensive and reliable information on the miRNAs, miRNA precursors, and expression profile of miRNA genes. Here we review the recently developed computational tools and their applications in profiling the miRNA transcriptomes, with an emphasis on the model plant Arabidopsis thaliana. Highlighted is also progress and insight into miRNA biology derived from analyzing available deep sequencing data. Full article
(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

Journal Contact

MDPI AG
Biology Editorial Office
St. Alban-Anlage 66, 4052 Basel, Switzerland
biology@mdpi.com
Tel. +41 61 683 77 34
Fax: +41 61 302 89 18
Editorial Board
Contact Details Submit to Biology
Back to Top