Research

Jump to: Review

261 KiB

Open AccessArticle

FLEXBAR—Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms

by Matthias Dodt, Johannes T. Roehr, Rina Ahmed and Christoph Dieterich

Biology 2012, 1(3), 895-905; https://doi.org/10.3390/biology1030895 - 14 Dec 2012

Cited by 465 | Viewed by 21636

Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a) specific to the sequencing platform and library construction method (e.g., adapter [...] Read more.

Quantitative and systems biology approaches benefit from the unprecedented depth of next-generation sequencing. A typical experiment yields millions of short reads, which oftentimes carry particular sequence tags. These tags may be: (a) specific to the sequencing platform and library construction method (e.g., adapter sequences); (b) have been introduced by experimental design (e.g., sample barcodes); or (c) constitute some biological signal (e.g., splice leader sequences in nematodes). Our software FLEXBAR enables accurate recognition, sorting and trimming of sequence tags with maximal flexibility, based on exact overlap sequence alignment. The software supports data formats from all current sequencing platforms, including color-space reads. FLEXBAR maintains read pairings and processes separate barcode reads on demand. Our software facilitates the fine-grained adjustment of sequence tag detection parameters and search regions. FLEXBAR is a multi-threaded software and combines speed with precision. Even complex read processing scenarios might be executed with a single command line call. We demonstrate the utility of the software in terms of read mapping applications, library demultiplexing and splice leader detection. FLEXBAR and additional information is available for academic use from the website: http://sourceforge.net/projects/flexbar/. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Figure 1

414 KiB

Open AccessArticle

Whole Genome Sequencing and a New Bioinformatics Platform Allow for Rapid Gene Identification in D. melanogaster EMS Screens

by Michael A. Gonzalez, Derek Van Booven, William Hulme, Rick H. Ulloa, Rafael F. Acosta Lebrigio, Jeannette Osterloh, Mary Logan, Marc Freeman and Stephan Zuchner

Biology 2012, 1(3), 766-777; https://doi.org/10.3390/biology1030766 - 05 Dec 2012

Cited by 9 | Viewed by 7621

Abstract

Forward genetic screens in Drosophila melanogaster using ethyl methanesulfonate (EMS) mutagenesis are a powerful approach for identifying genes that modulate specific biological processes in an in vivo setting. The mapping of genes that contain randomly-induced point mutations has become more efficient in Drosophila [...] Read more.

Forward genetic screens in Drosophila melanogaster using ethyl methanesulfonate (EMS) mutagenesis are a powerful approach for identifying genes that modulate specific biological processes in an in vivo setting. The mapping of genes that contain randomly-induced point mutations has become more efficient in Drosophila thanks to the maturation and availability of many types of genetic tools. However, classic approaches to gene mapping are relatively slow and ultimately require extensive Sanger sequencing of candidate chromosomal loci. With the advent of new high-throughput sequencing techniques, it is increasingly efficient to directly re-sequence the whole genome of model organisms. This approach, in combination with traditional chromosomal mapping, has the potential to greatly simplify and accelerate mutation identification in mutants generated in EMS screens. Here we show that next-generation sequencing (NGS) is an accurate and efficient tool for high-throughput sequencing and mutation discovery in Drosophila melanogaster. As a test case, mutant strains of Drosophila that exhibited long-term survival of severed peripheral axons were identified in a forward EMS mutagenesis. All mutants were recessive and fell into a single lethal complementation group, which suggested that a single gene was responsible for the protective axon degenerative phenotype. Whole genome sequencing of these genomes identified the underlying gene ect4. To improve the process of genome wide mutation identification, we developed Genomes Management Application (GEM.app, https://genomics.med.miami.edu), a graphical online user interface to a custom query framework. Using a custom GEM.app query, we were able to identify that each mutant carried a unique non-sense mutation in the gene ect4 (dSarm), which was recently shown by Osterloh et al. to be essential for the activation of axonal degeneration. Our results demonstrate the current advantages and limitations of NGS in Drosophila and we introduce GEM.app as a simple yet powerful genomics analysis tool for the Drosophila community. At a current cost of <$1,000 per genome, NGS should thus become a standard gene discovery tool in EMS induced genetic forward screens. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Figure 1

911 KiB

Open AccessArticle

The Population Genomics of Sunflowers and Genomic Determinants of Protein Evolution Revealed by RNAseq

by Sébastien Renaut, Christopher J. Grassa, Brook T. Moyers, Nolan C. Kane and Loren H. Rieseberg

Biology 2012, 1(3), 575-596; https://doi.org/10.3390/biology1030575 - 25 Oct 2012

Cited by 28 | Viewed by 9856

Abstract

Few studies have investigated the causes of evolutionary rate variation among plant nuclear genes, especially in recently diverged species still capable of hybridizing in the wild. The recent advent of Next Generation Sequencing (NGS) permits investigation of genome wide rates of protein evolution [...] Read more.

Few studies have investigated the causes of evolutionary rate variation among plant nuclear genes, especially in recently diverged species still capable of hybridizing in the wild. The recent advent of Next Generation Sequencing (NGS) permits investigation of genome wide rates of protein evolution and the role of selection in generating and maintaining divergence. Here, we use individual whole-transcriptome sequencing (RNAseq) to refine our understanding of the population genomics of wild species of sunflowers (Helianthus spp.) and the factors that affect rates of protein evolution. We aligned 35 GB of transcriptome sequencing data and identified 433,257 polymorphic sites (SNPs) in a reference transcriptome comprising 16,312 genes. Using SNP markers, we identified strong population clustering largely corresponding to the three species analyzed here (Helianthus annuus, H. petiolaris, H. debilis), with one distinct early generation hybrid. Then, we calculated the proportions of adaptive substitution fixed by selection (alpha) and identified gene ontology categories with elevated values of alpha. The “response to biotic stimulus” category had the highest mean alpha across the three interspecific comparisons, implying that natural selection imposed by other organisms plays an important role in driving protein evolution in wild sunflowers. Finally, we examined the relationship between protein evolution (d_N/d_S ratio) and several genomic factors predicted to co-vary with protein evolution (gene expression level, divergence and specificity, genetic divergence [F_ST], and nucleotide diversity pi). We find that variation in rates of protein divergence was correlated with gene expression level and specificity, consistent with results from a broad range of taxa and timescales. This would in turn imply that these factors govern protein evolution both at a microevolutionary and macroevolutionary timescale. Our results contribute to a general understanding of the determinants of rates of protein evolution and the impact of selection on patterns of polymorphism and divergence. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Figure 1

936 KiB

Open AccessArticle

TE-Locate: A Tool to Locate and Group Transposable Element Occurrences Using Paired-End Next-Generation Sequencing Data

by Alexander Platzer, Viktoria Nizhynska and Quan Long

Biology 2012, 1(2), 395-410; https://doi.org/10.3390/biology1020395 - 12 Sep 2012

Cited by 30 | Viewed by 9914

Abstract

Transposable elements (TEs) are common mobile DNA elements present in nearly all genomes. Since the movement of TEs within a genome can sometimes have phenotypic consequences, an accurate report of TE actions is desirable. To this end, we developed TE-Locate, a computational tool [...] Read more.

Transposable elements (TEs) are common mobile DNA elements present in nearly all genomes. Since the movement of TEs within a genome can sometimes have phenotypic consequences, an accurate report of TE actions is desirable. To this end, we developed TE-Locate, a computational tool that uses paired-end reads to identify the novel locations of known TEs. TE-Locate can utilize either a database of TE sequences, or annotated TEs within the reference sequence of interest. This makes TE-Locate useful in the search for any mobile sequence, including retrotransposed gene copies. One major concern is to act on the correct hierarchy level, thereby avoiding an incorrect calling of a single insertion as multiple events of TEs with high sequence similarity. We used the (super)family level, but TE-Locate can also use any other level, right down to the individual transposable element. As an example of analysis with TE-Locate, we used the Swedish population in the 1,001 Arabidopsis genomes project, and presented the biological insights gained from the novel TEs, inducing the association between different TE superfamilies. The program is freely available, and the URL is provided in the end of the paper. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Figure 1

151 KiB

Open AccessArticle

Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP

by Michał T. Lorenc, Satomi Hayashi, Jiri Stiller, Hong Lee, Sahana Manoli, Pradeep Ruperao, Paul Visendi, Paul J. Berkman, Kaitao Lai, Jacqueline Batley and David Edwards

Biology 2012, 1(2), 370-382; https://doi.org/10.3390/biology1020370 - 27 Aug 2012

Cited by 54 | Viewed by 9918

Abstract

Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a [...] Read more.

Single nucleotide polymorphisms (SNPs) are becoming the dominant form of molecular marker for genetic and genomic analysis. The advances in second generation DNA sequencing provide opportunities to identify very large numbers of SNPs in a range of species. However, SNP identification remains a challenge for large and polyploid genomes due to their size and complexity. We have developed a pipeline for the robust identification of SNPs in large and complex genomes using Illumina second generation DNA sequence data and demonstrated this by the discovery of SNPs in the hexaploid wheat genome. We have developed a SNP discovery pipeline called SGSautoSNP (Second-Generation Sequencing AutoSNP) and applied this to discover more than 800,000 SNPs between four hexaploid wheat cultivars across chromosomes 7A, 7B and 7D. All SNPs are presented for download and viewing within a public GBrowse database. Validation suggests an accuracy of greater than 93% of SNPs represent polymorphisms between wheat cultivars and hence are valuable for detailed diversity analysis, marker assisted selection and genotyping by sequencing. The pipeline produces output in GFF3, VCF, Flapjack or Illumina Infinium design format for further genotyping diverse populations. As well as providing an unprecedented resource for wheat diversity analysis, the method establishes a foundation for high resolution SNP discovery in other large and complex genomes. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

Review

Jump to: Research

277 KiB

Open AccessReview

Next-Generation Sequencing: From Understanding Biology to Personalized Medicine

by Karen S. Frese, Hugo A. Katus and Benjamin Meder

Biology 2013, 2(1), 378-398; https://doi.org/10.3390/biology2010378 - 01 Mar 2013

Cited by 34 | Viewed by 16380

Abstract

Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and [...] Read more.

Within just a few years, the new methods for high-throughput next-generation sequencing have generated completely novel insights into the heritability and pathophysiology of human disease. In this review, we wish to highlight the benefits of the current state-of-the-art sequencing technologies for genetic and epigenetic research. We illustrate how these technologies help to constantly improve our understanding of genetic mechanisms in biological systems and summarize the progress made so far. This can be exemplified by the case of heritable heart muscle diseases, so-called cardiomyopathies. Here, next-generation sequencing is able to identify novel disease genes, and first clinical applications demonstrate the successful translation of this technology into personalized patient care. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Figure 1

221 KiB

Open AccessReview

Methods, Challenges and Potentials of Single Cell RNA-seq

by Daniel Hebenstreit

Biology 2012, 1(3), 658-667; https://doi.org/10.3390/biology1030658 - 16 Nov 2012

Cited by 42 | Viewed by 13912

Abstract

RNA-sequencing (RNA-seq) has become the tool of choice for transcriptomics. Several recent studies demonstrate its successful adaption to single cell analysis. This allows new biological insights into cell differentiation, cell-to-cell variation and gene regulation, and how these aspects depend on each other. Here, [...] Read more.

RNA-sequencing (RNA-seq) has become the tool of choice for transcriptomics. Several recent studies demonstrate its successful adaption to single cell analysis. This allows new biological insights into cell differentiation, cell-to-cell variation and gene regulation, and how these aspects depend on each other. Here, I review the current single cell RNA-seq (scRNA-seq) efforts and discuss experimental protocols, challenges and potentials. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Figure 1

274 KiB

Open AccessReview

Genome Walking by Next Generation Sequencing Approaches

by Mariateresa Volpicella, Claudia Leoni, Alessandra Costanza, Immacolata Fanizza, Antonio Placido and Luigi R. Ceci

Biology 2012, 1(3), 495-507; https://doi.org/10.3390/biology1030495 - 01 Oct 2012

Cited by 22 | Viewed by 13278

Abstract

Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion [...] Read more.

Genome Walking (GW) comprises a number of PCR-based methods for the identification of nucleotide sequences flanking known regions. The different methods have been used for several purposes: from de novo sequencing, useful for the identification of unknown regions, to the characterization of insertion sites for viruses and transposons. In the latter cases Genome Walking methods have been recently boosted by coupling to Next Generation Sequencing technologies. This review will focus on the development of several protocols for the application of Next Generation Sequencing (NGS) technologies to GW, which have been developed in the course of analysis of insertional libraries. These analyses find broad application in protocols for functional genomics and gene therapy. Thanks to the application of NGS technologies, the original vision of GW as a procedure for walking along an unknown genome is now changing into the possibility of observing the parallel marching of hundreds of thousands of primers across the borders of inserted DNA molecules in host genomes. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Graphical abstract

358 KiB

Open AccessReview

Genotyping-by-Sequencing in Plants

by Stéphane Deschamps, Victor Llaca and Gregory D. May

Biology 2012, 1(3), 460-483; https://doi.org/10.3390/biology1030460 - 25 Sep 2012

Cited by 230 | Viewed by 28343

Abstract

The advent of next-generation DNA sequencing (NGS) technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP) detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is [...] Read more.

The advent of next-generation DNA sequencing (NGS) technologies has led to the development of rapid genome-wide Single Nucleotide Polymorphism (SNP) detection applications in various plant species. Recent improvements in sequencing throughput combined with an overall decrease in costs per gigabase of sequence is allowing NGS to be applied to not only the evaluation of small subsets of parental inbred lines, but also the mapping and characterization of traits of interest in much larger populations. Such an approach, where sequences are used simultaneously to detect and score SNPs, therefore bypassing the entire marker assay development stage, is known as genotyping-by-sequencing (GBS). This review will summarize the current state of GBS in plants and the promises it holds as a genome-wide genotyping application. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures

Figure 1

203 KiB

Open AccessReview

Why Assembling Plant Genome Sequences Is So Challenging

by Manuel Gonzalo Claros, Rocío Bautista, Darío Guerrero-Fernández, Hicham Benzerki, Pedro Seoane and Noé Fernández-Pozo

Biology 2012, 1(2), 439-459; https://doi.org/10.3390/biology1020439 - 18 Sep 2012

Cited by 88 | Viewed by 13150

Abstract

In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies [...] Read more.

In spite of the biological and economic importance of plants, relatively few plant species have been sequenced. Only the genome sequence of plants with relatively small genomes, most of them angiosperms, in particular eudicots, has been determined. The arrival of next-generation sequencing technologies has allowed the rapid and efficient development of new genomic resources for non-model or orphan plant species. But the sequencing pace of plants is far from that of animals and microorganisms. This review focuses on the typical challenges of plant genomes that can explain why plant genomics is less developed than animal genomics. Explanations about the impact of some confounding factors emerging from the nature of plant genomes are given. As a result of these challenges and confounding factors, the correct assembly and annotation of plant genomes is hindered, genome drafts are produced, and advances in plant genomics are delayed. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

143 KiB

Open AccessReview

Next-Generation Sequencing: Application in Liver Cancer—Past, Present and Future?

by Jens U. Marquardt and Jesper B. Andersen

Biology 2012, 1(2), 383-394; https://doi.org/10.3390/biology1020383 - 31 Aug 2012

Cited by 13 | Viewed by 6914

Abstract

Hepatocellular Carcinoma (HCC) is the third most deadly malignancy worldwide characterized by phenotypic and molecular heterogeneity. In the past two decades, advances in genomic analyses have formed a comprehensive understanding of different underlying pathobiological layers resulting in hepatocarcinogenesis. More recently, improvements of sophisticated [...] Read more.

Hepatocellular Carcinoma (HCC) is the third most deadly malignancy worldwide characterized by phenotypic and molecular heterogeneity. In the past two decades, advances in genomic analyses have formed a comprehensive understanding of different underlying pathobiological layers resulting in hepatocarcinogenesis. More recently, improvements of sophisticated next-generation sequencing (NGS) technologies have enabled complete and cost-efficient analyses of cancer genomes at a single nucleotide resolution and advanced into valuable tools in translational medicine. Although the use of NGS in human liver cancer is still in its infancy, great promise rests in the systematic integration of different molecular analyses obtained by these methodologies, i.e., genomics, transcriptomics and epigenomics. This strategy is likely to be helpful in identifying relevant and recurrent pathophysiological hallmarks thereby elucidating our limited understanding of liver cancer. Beside tumor heterogeneity, progress in translational oncology is challenged by the amount of biological information and considerable “noise” in the data obtained from different NGS platforms. Nevertheless, the following review aims to provide an overview of the current status of next-generation approaches in liver cancer, and outline the prospects of these technologies in diagnosis, patient classification, and prediction of outcome. Further, the potential of NGS to identify novel applications for concept clinical trials and to accelerate the development of new cancer therapies will be summarized. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

313 KiB

Open AccessReview

Analyzing the microRNA Transcriptome in Plants Using Deep Sequencing Data

by Xiaozeng Yang and Lei Li

Biology 2012, 1(2), 297-310; https://doi.org/10.3390/biology1020297 - 15 Aug 2012

Cited by 12 | Viewed by 10422

Abstract

MicroRNAs (miRNAs) are 20- to 24-nucleotide endogenous small RNA molecules emerging as an important class of sequence-specific, trans-acting regulators for modulating gene expression at the post-transcription level. There has been a surge of interest in the past decade in identifying miRNAs and [...] Read more.

MicroRNAs (miRNAs) are 20- to 24-nucleotide endogenous small RNA molecules emerging as an important class of sequence-specific, trans-acting regulators for modulating gene expression at the post-transcription level. There has been a surge of interest in the past decade in identifying miRNAs and profiling their expression pattern using various experimental approaches. In particular, ultra-deep sampling of specifically prepared low-molecular-weight RNA libraries based on next-generation sequencing technologies has been used successfully in diverse species. The challenge now is to effectively deconvolute the complex sequencing data to provide comprehensive and reliable information on the miRNAs, miRNA precursors, and expression profile of miRNA genes. Here we review the recently developed computational tools and their applications in profiling the miRNA transcriptomes, with an emphasis on the model plant Arabidopsis thaliana. Highlighted is also progress and insight into miRNA biology derived from analyzing available deep sequencing data. Full article

(This article belongs to the Special Issue Next Generation Sequencing Approaches in Biology)

► Show Figures