Open AccessThis article is
- freely available
A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly
CLC bio, 8200 Aarhus N, Denmark
Department of Biology, Box 118525, University of Florida, Gainesville, Florida, 32611-8525, USA
* Author to whom correspondence should be addressed.
Received: 1 June 2010; in revised form: 18 August 2010 / Accepted: 31 August 2010 / Published: 13 September 2010
Abstract: This study presents a new computer program for assessing the effects of different factors and sequencing strategies on de novo sequence assembly. The program uses reads from actual sequencing studies or from simulations with a reference genome that may also be real or simulated. The simulated reads can be created with our read simulator. They can be of differing length and coverage, consist of paired reads with varying distance, and include sequencing errors such as color space miscalls to imitate SOLiD data. The simulated or real reads are mapped to their reference genome and our assembly simulator is then used to obtain optimal assemblies that are limited only by the distribution of repeats. By way of this mapping, the assembly simulator determines which contigs are theoretically possible, or conversely (and perhaps more importantly), which are not. We illustrate the application and utility of our new simulation tools with several experiments that test the effects of genome complexity (repeats), read length and coverage, word size in De Bruijn graph assembly, and alternative sequencing strategies (e.g., BAC pooling) on sequence assemblies. These experiments highlight just some of the uses of our simulators in the experimental design of sequencing projects and in the further development of assembly algorithms.
Keywords: assembly simulator; read simulator; de novo assembly; sequencing strategies; next generation sequencing
Citations to this Article
Cite This Article
MDPI and ACS Style
Knudsen, B.; Forsberg, R.; Miyamoto, M.M. A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly. Genes 2010, 1, 263-282.
Knudsen B, Forsberg R, Miyamoto MM. A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly. Genes. 2010; 1(2):263-282.
Knudsen, Bjarne; Forsberg, Roald; Miyamoto, Michael M. 2010. "A Computer Simulator for Assessing Different Challenges and Strategies of de Novo Sequence Assembly." Genes 1, no. 2: 263-282.