Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP
Abstract
:1. Introduction
2. Results and Discussion
Wheat variety | Data generated | Data mapped to reference | % read pairs mapped |
---|---|---|---|
Drysdale | 168 Gbp | 8.65 Gbp | 5.14% |
Excalibur | 146 Gbp | 5.36 Gbp | 3.66% |
Gladius | 180 Gbp | 8.47 Gbp | 4.70% |
RAC875 | 132 Gbp | 4.1 Gbp | 3.10% |
SNP Primer Name | Forward Primer | Reverse Primer | SNP score | Validation |
---|---|---|---|---|
UQ7A27 | TAACATAAGCAAAGTTCTATTA | TTTGGAACACAATCGGAACTT | 6 | Failed |
UQ7A1397 | TCTATTGGATTCTTTCCGAT | TCACCCTGTGGAATGAAAGA | 5 | Failed |
UQ7A5622 | TTAGCCAAAATGGACCCAAA | CCTCTTTATTCAATCTGGAAACG | 2 | True SNP |
UQ7A129835 | TTCTTACTGTGGCTGCATCA | GCCATCCTAAACGACCTTCA | 5 | True SNP |
UQ7A9400 | GCCCATATGCAGTTCATGGT | AGAGCCAAACCTTCCCTGAT | 2 | Failed |
UQ7A7915 | CATGCCAACCCAAGTAGACC | GAAGCGTGAAAATTTCGTGA | 6 | True SNP |
UQ7A6107 | TGGTGTTTACGCTGAAGTTACC | CTGGCCTGGGCACTACATA | 6 | True SNP |
UQ7A2603 | GTCACCAACCAGCTCGAAAT | TTGTAGCTTTGCCTCTGTGAA | 2 | Failed |
UQ7A3491 | AGTCGCCGGCAGTAAAAATA | CCGAAGAAAATGTGGTGGAG | 4 | True SNP |
UQ7A4532 | TTTCCTCTAGATCTGTGCAAAATG | CATCCAGGACTGCATAAGCTC | 6 | True SNP |
UQ7A100138 | TCCCTGGTCCACGAGTTATT | AAATGGTTTGAGCCTTGTGC | 7 | Failed |
UQ7A136305 | CATCATCTTTGAAAAATCCTAGCC | TGTTCTGCAAGCTTCGTCTG | 5 | True SNP |
UQ7A155877 | AAGCTGTTGTGCCAGTGTTG | GAGCTAGCGTCGCTGACATA | 4 | True SNP |
UQ7A180868 | GACCGTCATCGAATGTAGCA | TCGTCCACCCAGACCTTATC | 3 | True SNP |
UQ7A287189 | GGCGATCATCACTTAAGAAACC | CAGTAATGAGGTTTCTGCTTGG | 2 | Failed |
UQ7A322716 | TCTGTTCGCAAACCAACG | GTGCGTTATCAGGGGAACAT | 11 | True SNP |
UQ7A57227 | ATGGGTGAAGGGAATACAGC | TGCATGCACATACAACCAAA | 5 | True SNP |
UQ7A87191 | TCAGTTCGGTAAGGATGAAGA | GAAGCAGTATGCATCTAAACTTTG | 6 | Heterozygous |
UQ7B21 | GCAGGGTTAATTTCTAGCAAGC | GCCTTTTATCCAAAGCCATC | 8 | Failed |
UQ7B484 | CTCAACCTCCCAAGCATGA | GCTATCCAGCTACCCTGTGC | 11 | Failed |
UQ7B3940 | GCCAGAGGCACTAGCATCAC | GGTAATTGTGGAGCAAGCAA | 6 | True SNP |
UQ7B4960 | GCATGGCATTTCAAGATCAG | GGAGGAGGACAAAGCCAGAT | 5 | True SNP |
UQ7B5991 | CCAAGCCACCACCCTTTAT | TAATCCCCGTCATCTCGAAG | 4 | True SNP |
UQ7B120997 | CTCCTCAGATGACCAATTTGC | CACCAAAATATGCTGTACAATTCTATG | 7 | Failed |
UQ7B256895 | GCAGCAGAGGTAGGCACTTC | GAAATGCTTCGAGTGTGGTG | 11 | True SNP |
UQ7B64318 | GGGTCCAGACTTCCACGTTA | CCCACATTAATTTGTACGACCTC | 6 | Failed |
UQ7B97303 | TGATTCGAGCCCATATAGGAA | AGCCATGCGGAAATATTGAG | 8 | True SNP |
UQ7D283 | TGAGTAAGACAACAATCAGAGCA | CAATGCGAGCAAAAAGATCA | 5 | True SNP |
UQ7D429 | TGTGCTGACGTGGCATCTAT | GCATGTGGAAAACGAGTGTG | 3 | True SNP |
UQ7D689 | CATCTGGCCTCAACATCAAA | TGTTGGTAGTGAGGCACTTCTT | 9 | Failed |
UQ7D948 | GGCGATACTCGATGAAAGAAA | TTGGAAACTACAATTGCACAAC | 9 | True SNP |
UQ7D1189 | GCGTGGAGTAGAGGGACAAG | TCCAAAAAGCAAAACAAATGC | 4 | True SNP |
UQ7D1491 | AGCGCAAGGAGGAGGTTAGT | GAGCCAAGTCCTTGTCAATTT | 7 | True SNP |
UQ7D1846 | AATGTGTTCCATCCAAGACG | GCCAAGGTCGACATGTGATA | 10 | True SNP |
UQ7D2314 | AAACAAGTCTGTGTTGCGTCA | TGCAGATACATGGCTCCAGA | 2 | Monomorphic |
UQ7D20375 | CTGCCACCAAACGGATTAAC | AATGCATTGGCAGTCACAAG | 6 | True SNP |
UQ7D27168 | TAATGCTATGCCGTGTCAGC | GCCACCTATTATTGAAGGCATC | 2 | True SNP |
UQ7D38754 | GAGCGAGCAATGCTAGTGTG | GAACCCATTTGATAACCGTGA | 3 | Failed |
UQ7D59683 | CGTCCACATTGTTGCAAATC | TTGACCCTGAAGGAAGGATG | 6 | True SNP |
UQ7D68910 | TTGCTTTATGCCACTGGAGA | TAGGCCGTGAAACATCAACA | 3 | True SNP |
3. Experimental Section
3.1. Data and Dependencies
3.2. Read Mapping
3.3. SNP Discovery
3.4. SNP Filtering
3.5. Generating a Consensus Sequence
3.6. Generating Illumina Marker Assay Files
3.7. Validation
4. Conclusions
Acknowledgments
References
- Batley, J.; Edwards, D. SNP applications in plants. In Association Mapping in Plants; Oraguzie, N., Rikkerink, E., Gardiner, S., de Silva, H., Eds.; Springer: New York, NY, USA, 2007; pp. 95–102. [Google Scholar]
- Duran, C.; Appleby, N.; Edwards, D.; Batley, J. Molecular genetic markers: Discovery, applications, data storage and visualisation. Curr. Bioinformatics 2009, 4, 16–27. [Google Scholar] [CrossRef]
- Edwards, D.; Batley, J. Plant bioinformatics: From genome to phenome. Trends Biotechnol. 2004, 22, 232–237. [Google Scholar] [CrossRef]
- Duran, C.; Eales, D.; Marshall, D.; Imelfort, M.; Stiller, J.; Berkman, P.J.; Clark, T.; McKenzie, M.; Appleby, N.; Batley, J.; et al. Future tools for association mapping in crop plants. Genome 2010, 53, 1017–1023. [Google Scholar] [CrossRef]
- Gupta, P.K. Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol. 2008, 26, 602–611. [Google Scholar] [CrossRef]
- Rafalski, A. Applications of single nucleotide polymorphisms in crop genetics. Curr. Opin. Plant Biol. 2002, 5, 94–100. [Google Scholar] [CrossRef]
- Varshney, R.K.; Nayak, S.N.; May, G.D.; Jackson, S.A. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol. 2009, 27, 522–530. [Google Scholar]
- Edwards, D.; Forster, J.W.; Chagné, D.; Batley, J. What are SNPs? In Association Mapping in Plants; Oraguzie, N.C., Rikkerink, E.H.A., Gardiner, S.E., de Silva, H.N., Eds.; Springer: New York, NY, USA, 2007; pp. 41–52. [Google Scholar]
- Barker, G.; Batley, J.; O'Sullivan, H.; Edwards, K.J.; Edwards, D. Redundancy based detection of sequence polymorphisms in expressed sequence tag data using autoSNP. Bioinformatics 2003, 19, 421–422. [Google Scholar] [CrossRef]
- Edwards, D.; Forster, J.W.; Cogan, N.O.I.; Batley, J.; Chagné, D. Single nucleotide polymorphism discovery. In Association Mapping in Plants; Oraguzie, N., Rikkerink, E., Gardiner, S., de Silva, H., Eds.; Springer: New York, NY, USA, 2007; pp. 53–76. [Google Scholar]
- Batley, J.; Edwards, D. Mining for Single Nucleotide Polymorphism (SNP) and Simple Sequence Repeat (SSR) molecular genetic markers. In Bioinformatics for DNA Sequence Analysis; Posada, D., Ed.; Humana Press: New York, NY, USA, 2009; pp. 303–322. [Google Scholar]
- Duran, C.; Appleby, N.; Clark, T.; Wood, D.; Imelfort, M.; Batley, J.; Edwards, D. AutoSNPdb: An annotated single nucleotide polymorphism database for crop plants. Nucleic Acids Res. 2009, 37, D951–D953. [Google Scholar] [CrossRef]
- Duran, C.; Appleby, N.; Vardy, M.; Imelfort, M.; Edwards, D.; Batley, J. Single nucleotide polymorphism discovery in barley using autoSNPdb. Plant Biotechnol. J. 2009, 7, 326–333. [Google Scholar]
- Paux, E.; Sourdille, P.; Salse, J.; Saintenac, C.; Choulet, F.; Leroy, P.; Korol, A.; Michalak, M.; Kianian, S.; Spielmeyer, W.; et al. A physical map of the 1-gigabase bread wheat chromosome 3B. Science 2008, 322, 101–104. [Google Scholar]
- Flavell, R.B.; Rimpau, J.; Smith, D.B. Repeated sequence DNA relationships in four cereal genomes. Chromosoma 1977, 63, 205–222. [Google Scholar] [CrossRef]
- Wanjugi, H.; Coleman-Derr, D.; Huo, N.X.; Kianian, S.F.; Luo, M.C.; Wu, J.J.; Anderson, O.; Gu, Y.Q. Rapid development of PCR-based genome-specific repetitive DNA junction markers in wheat. Genome 2009, 52, 576–587. [Google Scholar] [CrossRef]
- Berkman, P.J.; Lai, K.; Lorenc, M.T.; Edwards, D. Next generation sequencing applications for wheat crop improvement. Am. J. Bot. 2012, 99, 365–371. [Google Scholar] [CrossRef]
- Berkman, P.J.; Skarshewski, A.; Lorenc, M.T.; Lai, K.; Duran, C.; Ling, E.Y.S.; Stiller, J.; Smits, L.; Imelfort, M.; Manoli, S.; et al. Sequencing and assembly of low copy and genic regions of isolated Triticum aestivum chromosome arm 7DS. Plant Biotechnol. J. 2011, 9, 768–775. [Google Scholar]
- Berkman, P.J.; Skarshewski, A.; Manoli, S.; Lorenc, M.T.; Stiller, J.; Lars; Smits, L.; Lai, K.; Campbell, E.; Kubalakova, M.; et al. Sequencing wheat chromosome arm 7BS delimits the 7BS/4AL translocation and reveals homoeologous gene conservation. Theor. Appl. Genet. 2012, 124, 423–432. [Google Scholar] [CrossRef]
- Lai, K.; Berkman, P.J.; Lorenc, M.T.; Duran, C.; Smits, L.; Manoli, S.; Stiller, J.; Edwards, D. WheatGenome.info: An integrated database and portal for wheat genome information. Plant Cell Physiol. 2012, 53, 1–7. [Google Scholar] [CrossRef]
- Edwards, D.; Wilcox, S.; Barrero, R.A.; Fleury, D.; Cavanagh, C.R.; Forrest, K.L.; Hayden, M.J.; Moolhuijzen, P.; Keeble-Gagnère, G.; Bellgard, M.I.; et al. Bread matters: A national initiative to profile the genetic diversity of Australian wheat. Plant Biotechnol. J. 2012, in press.. [Google Scholar]
- Batley, J.; Edwards, D. Genome sequence data: Management, storage, and visualization. Biotechniques 2009, 46, 333–336. [Google Scholar] [CrossRef]
- Duran, C.; Edwards, D.; Batley, J. Molecular marker discovery and genetic map visualisation. In Applied Bioinformatics; Edwards, D., Hanson, D., Stajich, J., Eds.; Springer: New York, NY, USA, 2009. [Google Scholar]
- Imelfort, M.; Duran, C.; Batley, J.; Edwards, D. Discovering genetic polymorphisms in next-generation sequencing data. Plant Biotechnol. J. 2009, 7, 312–317. [Google Scholar] [CrossRef]
- Lee, H.; Lai, K.; Lorenc, M.T.; Imelfort, M.; Duran, C.; Edwards, D. Bioinformatics tools and databases for analysis of next generation sequence data. Brief. Funct. Genomics 2012, 2, 12–24. [Google Scholar]
- Savage, D.; Batley, J.; Erwin, T.; Logan, E.; Love, C.G.; Lim, G.A.C.; Mongin, E.; Barker, G.; Spangenberg, G.C.; Edwards, D. SNPServer: A real-time SNP discovery tool. Nucleic Acids Res. 2005, 33, W493–W495. [Google Scholar] [CrossRef]
- Edwards, D. Wheatgenome.info. Available online: http://www.wheatgenome.info (accessed on 17 August 2012).
- Fröhler, S.; Dieterich, C. ACCUSA—Accurate SNP calling on draft genomes. Bioinformatics 2010, 26, 1364–1365. [Google Scholar] [CrossRef]
- You, F.; Huo, N.; Deal, K.; Gu, Y.; Luo, M.-C.; McGuire, P.; Dvorak, J.; Anderson, O. Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence. BMC Genomics 2011, 12, 59. [Google Scholar]
- Grant, J.R.; Arantes, A.S.; Liao, X.; Stothard, P. In-depth annotation of SNPs arising from resequencing projects using NGS-SNP. Bioinformatics 2011, 27, 2300–2301. [Google Scholar]
- Shen, Y.; Wan, Z.; Coarfa, C.; Drabek, R.; Chen, L.; Ostrowski, E.A.; Liu, Y.; Weinstock, G.M.; Wheeler, D.A.; Gibbs, R.A.; et al. A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res. 2010, 20, 273–280. [Google Scholar]
- Hernandez, P.; Martis, M.; Dorado, G.; Pfeifer, M.; Gálvez, S.; Schaaf, S.; Jouve, N.; Šimková, H.; Valárik, M.; Doležel, J.; et al. NGS and syntenic integration of flow-sorted arms of wheat chromosome 4A exposes the chromosome structure and gene content. Plant J. 2011, 69, 377–386. [Google Scholar]
- Li, R.; Yu, C.; Li, Y.; Lam, T.W.; Yiu, S.M.; Kristiansen, K.; Wang, J. SOAP2: An improved ultrafast tool for short read alignment. Bioinformatics 2009, 25, 1966–1967. [Google Scholar] [CrossRef]
- Coulondre, C.; Miller, J.H.; Farabaugh, P.J.; Gilbert, W. Molecular-Basis of Base Substitution Hotspots in Escherichia coli. Nature 1978, 274, 775–780. [Google Scholar]
- Allen, A.M.; Barker, G.L.A.; Berry, S.T.; Coghill, J.A.; Gwilliam, R.; Kirby, S.; Robinson, P.; Brenchley, R.C.; D’Amore, R.; McKenzie, N.; et al. Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnol. J. 2011, 9, 1086–1099. [Google Scholar]
- Lai, K.; Duran, C.; Berkman, P.J.; Lorenc, M.T.; Stiller, J.; Manoli, S.; Hayden, M.; Forrest, K.L.; Fleury, D.; Baumann, U.; et al. Single nucleotide polymorphism discovery from wheat next generation sequence data. Plant Biotechnol. J. 2012, in press.. [Google Scholar]
- Edwards, D. AutoSNPdb. Available online: http://autosnpdb.appliedbioinformatics.com.au/ (accessed on 17 August 2012).
- Bioplatforms. Bioplatforms datasets. Available online: http://www.bioplatforms.com.au/datasets/wheat (accessed on 17 August 2012).
- Heger, A. Pysam, 0.5+. 2012. Available online: http://code.google.com/p/pysam (accessed on 17 August 2012).
- Foundation, P.S. Biopython, 1.58+, Python Software Foundation: Wolfeboro Falls, NH, USA, 2012.
- Cock, P.J.A.; Antao, T.; Chang, J.T.; Chapman, B.A.; Cox, C.J.; Dalke, A.; Friedberg, I.; Hamelryck, T.; Kauff, F.; Wilczynski, B.; et al. Biopython: Freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009, 25, 1422–1423. [Google Scholar]
- Li, H.; Handsaker, B.; Wysoker, A.; Fennell, T.; Ruan, J.; Homer, N.; Marth, G.; Abecasis, G.; Durbin, R.; Subgroup, G.P.D.P. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009, 25, 2078–2079. [Google Scholar]
- Institute, B.G. soap2sam.pl. 2010. Available online: http://soap.genomics.org.cn/down/soap2sam.tar.gz (accessed on 17 August 2012).
- Danecek, P.; Auton, A.; Abecasis, G.; Albers, C.A.; Banks, E.; DePristo, M.A.; Handsaker, R.E.; Lunter, G.; Marth, G.T.; Sherry, S.T.; et al. The variant call format and VCFtools. Bioinformatics 2011, 27, 2156–2158. [Google Scholar]
- Hou, H.; Zhao, F.; Zhou, L.; Zhu, E.; Teng, H.; Li, X.; Bao, Q.; Wu, J.; Sun, Z. MagicViewer: Integrated solution for next-generation sequencing data visualization and genetic variation detection and annotation. Nucleic Acids Res. 2010, 38, W732–W736. [Google Scholar] [CrossRef]
- Donlin, M. Using the Generic Genome Browser (GBrowse). Curr. Protoc. Bioinformatics 2007, Chapter 9, Unit 9.9. [Google Scholar]
- Milne, I.; Bayer, M.; Cardle, L.; Shaw, P.; Stephen, G.; Wright, F.; Marshall, D. Tablet—Next generation sequence assembly visualization. Bioinformatics 2010, 26, 401–402. [Google Scholar] [CrossRef]
- Milne, I.; Shaw, P.; Stephen, G.; Bayer, M.; Cardle, L.; Thomas, W.T.B.; Flavell, A.J.; Marshall, D. Flapjack-graphical genotype visualization. Bioinformatics 2010, 26, 3133–3134. [Google Scholar]
- Fulton, T.; Chunwongse, J.; Tanksley, S. Microprep protocol for extraction of DNA from tomato and other herbaceous plants. Plant Mol. Biol. Rep. 1995, 13, 207–209. [Google Scholar] [CrossRef]
- Sambrook, J.; Russel, D.W. Molecular Cloning: A Laboratory Manual, 3rd ed; Cold Spring Harbor Laboratory Press: Cold Spring Harbor, NY, USA, 2001. [Google Scholar]
- Boyle, J.S.; Lew, A.M. An inexpensive alternative to glassmilk for DNA purification. Trends Genet. 1995, 11, 8. [Google Scholar] [CrossRef]
- Drummond, A.J.; Ashton, B.; S, B.; Cheung, M.; Cooper, A.; Duran, C.; Field, M.; Heled, J.; Kearse, M.; Markowitz, S.; et al. Geneious, v5.4. Available online: http://www.geneious.com/ (accessed on 17 August 2012).
© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
Share and Cite
Lorenc, M.T.; Hayashi, S.; Stiller, J.; Lee, H.; Manoli, S.; Ruperao, P.; Visendi, P.; Berkman, P.J.; Lai, K.; Batley, J.; et al. Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP. Biology 2012, 1, 370-382. https://doi.org/10.3390/biology1020370
Lorenc MT, Hayashi S, Stiller J, Lee H, Manoli S, Ruperao P, Visendi P, Berkman PJ, Lai K, Batley J, et al. Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP. Biology. 2012; 1(2):370-382. https://doi.org/10.3390/biology1020370
Chicago/Turabian StyleLorenc, Michał T., Satomi Hayashi, Jiri Stiller, Hong Lee, Sahana Manoli, Pradeep Ruperao, Paul Visendi, Paul J. Berkman, Kaitao Lai, Jacqueline Batley, and et al. 2012. "Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP" Biology 1, no. 2: 370-382. https://doi.org/10.3390/biology1020370
APA StyleLorenc, M. T., Hayashi, S., Stiller, J., Lee, H., Manoli, S., Ruperao, P., Visendi, P., Berkman, P. J., Lai, K., Batley, J., & Edwards, D. (2012). Discovery of Single Nucleotide Polymorphisms in Complex Genomes Using SGSautoSNP. Biology, 1(2), 370-382. https://doi.org/10.3390/biology1020370