Next Article in Journal
Optimized mtDNA Control Region Primer Extension Capture Analysis for Forensically Relevant Samples and Highly Compromised mtDNA of Different Age and Origin
Previous Article in Journal
Zebrafish in Translational Cancer Research: Insight into Leukemia, Melanoma, Glioma and Endocrine Tumor Biology

Genes 2017, 8(10), 238; https://doi.org/10.3390/genes8100238

Communication
Development of Genome-Wide SSR Markers from Angelica gigas Nakai Using Next Generation Sequencing
1
Department of Industrial Plant Science & Technology, Chungbuk National University, Chungju 28644, Korea
2
Forest Medicinal Resources Research Center, National Institute of Forest Science, Yeongju 36040, Korea
3
Department of Herbal Crop Research, National Institute of Horticultural and Herbal Science, Rural Development Administration, Eumseong 27709, Korea
4
Research Institute of Climate Change and Agriculture, National Institute of Horticultural and Herbal Science, Rural Development Administration, Jeju 63240, Korea
5
TheragenEtex Bio Institute, Suwon 16229, Korea
6
Life Sciences Research Institute, Biomedic Co., Ltd., Bucheon 14548, Korea
7
Department of Biosystems Engineering, Chungbuk National University, Chungju 28644, Korea
8
Korea Zoonosis Research Institute, Chonbuk National University, Iksan 54531, Korea
*
Correspondence: [email protected]; Tel.: +82-43-261-3373
These authors contributed equally to this work.
Received: 2 August 2017 / Accepted: 18 September 2017 / Published: 21 September 2017

Abstract

:
Angelica gigas Nakai is an important medicinal herb, widely utilized in Asian countries especially in Korea, Japan, and China. Although it is a vital medicinal herb, the lack of sequencing data and efficient molecular markers has limited the application of a genetic approach for horticultural improvements. Simple sequence repeats (SSRs) are universally accepted molecular markers for population structure study. In this study, we found over 130,000 SSRs, ranging from di- to deca-nucleotide motifs, using the genome sequence of Manchu variety (MV) of A. gigas, derived from next generation sequencing (NGS). From the putative SSR regions identified, a total of 16,496 primer sets were successfully designed. Among them, we selected 848 SSR markers that showed polymorphism from in silico analysis and contained tri- to hexa-nucleotide motifs. We tested 36 SSR primer sets for polymorphism in 16 A. gigas accessions. The average polymorphism information content (PIC) was 0.69; the average observed heterozygosity (HO) values, and the expected heterozygosity (HE) values were 0.53 and 0.73, respectively. These newly developed SSR markers would be useful tools for molecular genetics, genotype identification, genetic mapping, molecular breeding, and studying species relationships of the Angelica genus.
Keywords:
Angelica gigas Nakai; next generation sequencing (NGS); simple sequence repeat (SSR)

1. Introduction

The genus Angelica, which includes more than 60 species, is an important medicinal herb in Far East countries, especially in Korea, Japan, and China. Many of these species have been used in traditional medicine to treat many diseases in Asian and Western countries [1]. Angelica species that grow naturally in South Korea, beside Angelica gigas Nakai, include Angelica acutiloba Kitagawa, Angelica archangelica L., Angelica atropurpurea L., Angelica dahurica (Fisch. ex Hoffm.) Benth. & Hook. f. ex Franch. & Sav., Angelica decursiva (Miq.) Franch. & Sav., Angelica genuflexa Nutt. ex Torr. & A. Gray, Angelica glauca Edgew., Angelica jaluana Nakai, Angelica japonica A. Gray, Angelica koreana L., Angelica sinensis Diels, Angelica sylvestris L., and others.
The root of A. gigas contains active compounds, such as decursin, and decursinol angelate [2], which is successfully used in treating gynecological and anemic health issues [3,4]. They are also used for inducing immune system, and for constipation relief [1,5]. Herbalists consider it equivalent to ginseng due to its remedial properties for several diseases including treating cancer [6], an anti-oxidant [7], neuro-protective, anti-flu, anti-hepatitis [1], and anti-bacterial activities [8]. The A. gigas root also contains important chemical compounds like pyranocoumarins, essential oils, and poly-acetylenes.
Due to the scarcity of efficient molecular markers, there is a dearth of information about the genetic relationships and genetic diversity among breeding populations of Angelica crops [9]. Thus, to address these issues, during the past 30 years many kinds of markers have been developed and used for crop genetics and breeding [10]. The improvement in DNA sequencing and genotyping in conjunction with the development of computer programs have helped the design of molecular markers such as simple sequence repeats (SSRs) in minor crops [11].
Molecular markers are widely used in various areas such as genetic-diversity characterization, gene-flow studies, quantitative-trait locus (QTL) mapping, and evolutionary studies [12]. Accumulated plant genome sequences have accelerated single nucleotide polymorphism (SNP) marker development. Numerous SNP markers have been successfully used to develop various crop cultivars [13]. However, SNP marker technology requires special equipment and it is costly. In contrast, SSR markers are very convenient and easy to use. SSR markers are tandemly repeated nucleotide sequence motifs [14]; that are dispersed throughout the genome and may exhibit locus-specific codominance and high polymorphism rates, and have a high level of transferability [15,16,17]. Furthermore, molecular markers can be used to distinguish cultivars or plants originating from different locations. SSR markers are effective tools that play an important role in plant cultivar identification and genetic diversity analysis [18]. The next generation sequencing (NGS) technologies can be used for fast and cost-effective SSR discovery in many crops [19].
The Angelica species used in conventional medicine varies by country according to specific regulations, i.e., A. gigas in Korea, A. sinensis in China, and A. acutiloba in Japan. Because of the similarity between the names among Angelica, they can be confused in the market [20]. SSR markers are especially useful to distinguish closely related genotypes. This is the reason that SSR markers are widely used in genetic studies and identification of closely related cultivars. By using these SSR markers, various problems of Angelica genus can be solved [21].
In this study, we developed polymorphic SSR markers from several A. gigas accessions using NGS and bioinformatics approaches. The SSR markers developed in this study could be used for genetics and the breeding of Angelica genus.

2. Material and Methods

2.1. Plant Material and DNA Isolation

Sixteen widely cultivated A. gigas accessions were collected in Korea (Table S1). All of the collected roots or seeds were germinated and grown in the Chungbuk National University greenhouse. Young leaves or fresh seedlings were used for extraction of the genomic DNA (gDNA), which is required for library construction and sequencing. Fresh leaves were ground with liquid nitrogen and DNA was extracted using a DNeasy Plant Mini kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions.

2.2. Genomic DNA Sequencing and Sequence Assembly

The libraries were constructed from five A. gigas accessions: Manchu variety (MV), Hwangje variety (HV), Gangwon local variety (GLV), Jecheon local variety (JLV), and Sancheong local variety (SLV). Sequencing size was one lane for MV and 1/3 lane for the other four varieties. Sequencing libraries for DNA samples were prepared by a TruSeq Nano DNA Sample Prep Kit, according to the manufacturer's instructions (Illumina, Inc., San Diego, CA, USA). The libraries were subjected to paired-end sequencing with a 101 bp read length using the Illumina HiSeq 2500 platform (Illumina). Errors of short reads for each sample were corrected by using the SOAPec part of SOAPdenovo2 version 2.04 [22]. After error correction, short reads corresponding to ≥13× of genome size, estimated on the basis of a k-mer frequency spectrum, were assembled by using SOAPdenovo2 with the adjustment of k-mer size of 53. Repetitive DNA sequences, such as microsatellites, transposable elements, and rDNAs, were screened in the assembled contigs by using RepeatMasker version 4.0.5 [23] and RepeatModeler version 1.0.8 [23]. From the resulting repeat-masked sequences, gene models were built by ab initio prediction coupled with transcript alignment using AUGUSTUS version 3.1 [24], with the training gene set derived from the genome of Platycodon grandiflorum (unpublished), which belongs to the subclass Asterids. Genes were searched against UniProt and NCBI non-redundant (NR) protein databases using BLASTX, with a cutoff E-value of 10−10. Protein domains in genes were searched by using InterProScan version 5.17-56.0 [25].

2.3. SSR Findings and Primer Designs

Simple sequence repeats were identified by using SSR Finder [26] with the following parameters: (1) microsatellites consisting of tandem repeats of 2 to 10 bp with a minimum of five repeats; and, (2) no variation (mutation) in repeat motifs was permitted. Primers were designed from flanking sequences of SSR microsatellite loci by using Primer3 version 2.2.3 [27] with the following parameters: primer length = 18–26 bp (Opt. 23 bp); GC % = 50–70% (Opt. 60%); temperature = 55–62 °C (Opt. 58 °C); and, product size range = 150–200 bp. Repetitive DNA, as well as duplicate sequences, were excluded and selected primers from the unique genomic regions. Forward primers were labeled with a virtual dye 6-FAM, NED, VIC, or PET (Applied Biosystems, Foster City, CA, USA).

2.4. In Silico SSR Polymorphism Screening

In silico SSR polymorphism screening was performed by using CLC Main Workbench version 6.8.4 (CLC bio, Aarhus, Denmark) BLAST tool. HV, GLV, JLV and SLV sequences were assembled by using CLC Genomics Workbench version 7.5 (CLC bio, Aarhus, Denmark) and prepared for BLAST data base. The SSR regions (450 bp each) identified from MV sequences were used as reference sequences in a BLAST search. BLAST analyses were performed by the following parameters: BLAST program = blastn: DNA sequence and database; Target = BLAST database (HV, GLV, JLV and SLV sequences); Number of threads = 1; Choose filter = Filter low complexity; Expect = 10.0; Word size = 3; Match/mismatch cost = match 1, mismatch −3; Gap costs = Existence 5, Extension 2; and, Max number of hit sequences = 250 (Figure S1).

2.5. PCR Amplification and Genotyping

The PCR reaction mixture (20 μL) consists of 20 ng of gDNA, 1× HSTM Taq DNA polymerase buffer, 1.5 mM MgCl2, 0.2 mM of each dNTP, 0.2 M of each primer, and 1.25 units HSTM Taq DNA polymerase (Dongsheng, Göttingen, Germany). DNA amplification was performed by using the PCR system (BIO-RAD T-100 Thermal Cycler) according to the following: 5 min for initial denaturation at 95 °C, 34 cycles of 30 s at 94 °C, 30 s at 58–60.5 °C, 30 s at 72 °C, concluding with 1 cycle of 30 min at 72 °C. PCR products were separated in a 4% agarose gel to check the amplification and 0.2 L of PCR products were mixed with 9.8 L Hi-Di formamide (Applied Biosystems, Foster City, CA, USA), and 0.2 L of GeneScanTM 500 LIZ®size standard (Applied Biosystems). The PCR product mixture was denatured at 95 °C for 5 min and kept on ice. Subsequently, the mixture was separated by capillary electrophoresis on the ABI 3730 DNA analyzer (Applied Biosystems) by using a 50 cm-capillary array with a DS-33 install standard as a matrix. We analyzed the amplified fragments by size with GeneMapper software version 4.0 (Applied Biosystems). The number of alleles (NA), frequency of major alleles (MAF), observed heterozygosity (HO) values, expected heterozygosity (HE) values, and polymorphic index content (PIC) values were measured by using PowerMarker software version 3.23 [28]. Coefficients of genetic similarity were calculated by using the molecular evolutionary genetics analysis (MEGA) program version 5.05 [29]. Genetic distance was computed by using the SharedAllele distance method with PowerMarker software version 3.23.

3. Results and Discussion

3.1. Characteristics of Genomic SSR in the NGS Assemblies of A. gigas

Libraries were constructed from the five A. gigas accessions (Table 1) and subjected to paired-end sequencing with a 101 bp read length by using an Illumina HiSeq 2500 platform. The SOAPdenovo2 assembler (version 2.04) was used to assemble the A. gigas genome de novo, generating 395,007 scaffolds representing 804 Mb from the A. gigas (MV) genome. MV is a breed that was developed based on the root hypertrophy and slow-bolting characteristic from A. gigas. Thus, we selected MV as a representative A. gigas variety. The other four varieties were selected due to their diverse geographical origins. The four varieties were used for in silico polymorphism analysis. Analysis of k-mer frequency spectrum revealed a peak (k-mer depth: 11) in the 17 k-mer depth distribution of reads, suggesting a low heterozygosity across the genome (Figure S2). Based on the k-mer frequency analysis, the genome size of MV was estimated to be approximately 2.67 Gb (No. of k-mer/depth of k-mer at peak). Conclusively, we found, 138,113 SSRs from the de novo assembly of the MV genome. The total number of 121,112 di-nucleotide, and 13,211 tri-nucleotide SSRs constituted 87.691% and 9.565% of SSRs, respectively, followed by tri-, tetra-, penta- and hexa-nucleotide motifs (Table S2).

3.2. SSR Marker Development by In Silico Polymorphism Analysis and Characterization of SSR Markers

We successfully designed 16,496 SSR primer pairs, out of 138,113 SSRs from the unique genomic regions of MV genome assembly. The majority of designed SSR primer sets was di-nucleotide (14,113, in SSR, 10.218%), followed by tri-nucleotide (2064, in SSR, 1.494%), tetra-nucleotide (242, in SSR, 0.175%), penta-nucleotide (36, in SSR, 0.026%), and hexa-nucleotide (37, in SSR, 0.027%) motifs (Table S2). Initially, five A. gigas accessions (MV, HV, GLV, JLV, and SLV) were selected to detect the polymorphism of these primer sets by in silico analysis.
The MV scaffolds including the SSRs were subjected to BLAST analysis against the assembled contigs of four other varieties (HV, GLV, JLV, and SLV). Polymorphic SSR markers showed two to five different repeat length scaffolds from the five A. gigas accessions. Consequently, a total of 848 polymorphic SSR markers were selected from the in silico analyzed tri-nucleotide to hexa-nucleotide motifs (a total of 2379 SSR primer sets).
We selected 48 polymorphic SSR markers, from 848 in silico analyzed primer pairs, which were showing four or more different repeat length scaffolds types, or more than 15 bp of nucleotide length difference (Table S2; Figure S1) and further used for genotyping of five A. gigas accessions (MV, HV, GLV, JLV, and SLV). Thirty-six SSR primer sets yielded balanced, intact, reproducible, and polymorphic amplicons in 4% agarose gels. The remaining 12 SSR primer sets were amplified more than two bands or showed unexpected amplicon sizes. The 36 SSR primer sets were deposited to GenBank (Table 2). Genomic regions of the 36 SSR loci were also analyzed on the basis of ab initio gene identification in the assembled sequences, and 16 (44.4%), 9 (25.0%), 9 (25.0%), and 2 (5.6%) were found from intergenic, intron, coding DNA sequence (CDS), and 5′ untranslated region (5'UTR), respectively (Table S3).
Subsequently, 16 A. gigas accessions were used to detect the polymorphisms of the 36 SSR loci after one of the primers from each primer set was labeled. Polymorphism estimation indicated that the NA per locus varied widely among the markers, ranging from 3 to 15, with an average of 6.36 alleles. The MAF per locus was 0.16–0.63, with an average of 0.39. In addition, the HE values were 0.53–0.90, with an average of 0.73; HO values were 0.13–0.88, with an average of 0.53. Ultimately, PIC values were 0.44–0.89, with an average of 0.69 (Table 3). Thirty-six SSR markers showed a high PIC value (PIC > 0.6) and, based on our results, we constructed the genetic relationship of 16 A. gigas accessions. Phylogenetic trees were constructed for the 16 A. gigas accessions by unweighted pair group method with arithmetic mean (UPGMA) analysis (Figure S3).
The study of genetic diversity and genetic relationships of crops can assist breeders in the development of economically effective cultivars. SSR markers have been used in many areas, including linkage map development, QTL mapping, marker-assisted selection, cultivar fingerprinting, genetic diversity, gene flow, and evolutionary studies in the botanic sciences [30,31,32]. NGS technology is a modern technique used to discover large-scale genetic polymorphisms [32,33,34]. Since NGS technology allows cost-effective and higher throughput genome sequencing, it has been universally applied for many crop species, including Allium fistulosum L., Sesamum indicum L., and Vicia sativa subsp. Sativa [35,36]. Recently, a study on the development of 18 polymorphic microsatellite markers for A. sinensis using NGS technology has been reported although it was focused on species identification [37]. In this study, we developed SSR markers that could be used for genetics, breeding, and diversity analyses of Angelica species.
The selection of isolated SSR loci from the genome or transcriptome data would be useful to analyze data, project goals, and future plans [34]. Our mass production method of SSR markers using an in silico approach, and resulted in 36 new polymorphic SSR markers from A. gigas. The developed markers from this study are expected to be useful in crop genetics and breeding. Even though the remaining 12 SSR markers amplified multiple bands or yielded bands of unexpected size, they were also polymorphic in the sequence. Therefore, the 848 SSR marker candidates discovered through this research are expected to represent informative markers.
Even though we restricted the purpose of this study on the selection of polymorphic SSR markers for the genetic diversity analysis of the genetic resources, application of the primers to more diverse Angelica species resources can enlarge the usage of the developed markers. In addition, we hope that the A. gigas sequence data of this study can be used to develop various molecular markers, such as SNP and insertion-deletion (InDel) markers, in the future.

Supplementary Materials

The following are available online at www.mdpi.com/2073-4425/8/10/238/s1. Figure S1: Different repeat length scaffold types of four or more, and SSR markers of more than 15 bp nucleotide length difference, Figure S2: The k-mer depth distribution of whole-genome sequencing reads of Angelica gigas, Figure S3: Dendrogram generated using UPGMA cluster analysis based on genetic diversity of 16 Angelica gigas Nakai accessions, Table S1: List of 16 Angelica gigas accessions and information of collection sites, Table S2: Details of Manchu variety accession repeat genome assemblies, Table S3: Number of designed SSR primer sets and polymorphic SSRs discovered in silico.

Acknowledgments

This work was carried out with the support of “Cooperative Research Program for Agriculture Science & Technology Development (Project No. PJ01102202)” Rural Development Administration, Korea. This work was conducted during the research year of Chungbuk National University in 2010.

Author Contributions

O.T.K., S.-C.K., S.C.K., H.B.K. and Y.L. conceived and designed the experiments; J.G., Y.U. and S.K. performed the experiments; C.P.H., S.-G.P. and Y.L. analyzed the data; D.H.L. contributed reagents/materials/analysis tools; J.G., B.-H.J., C.S.R, J-W.C. and Y.L. wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; nor in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

  1. Sarker, S.D.; Nahar, L. Natural medicine: The genus Angelica. Curr. Med. Chem. 2004, 11, 1479–1500. [Google Scholar] [CrossRef] [PubMed]
  2. Chan, P.H.; Zhang, W.L.; Lau, C.H.; Cheung, C.Y.; Keun, H.C.; Tsim, K.W.K.; Lam, H. Metabonomic analysis of water extracts from different Angelica roots by 1H-nuclear magnetic resonance spectroscopy. Molecules 2014, 19, 3460–3470. [Google Scholar] [CrossRef] [PubMed]
  3. Son, C.Y.; Baek, I.H.; Song, G.Y.; Kang, J.S.; Kwon, K.I. Pharmacological effect of decursin and decursinol angelate from Angelica gigas Nakai. Yakhak Hoeji 2009, 53, 303–313. [Google Scholar]
  4. Son, S.H.; Park, K.K.; Park, S.K.; Kim, Y.C.; Kim, Y.S.; Lee, S.K.; Chung, W.Y. Decursin and decursinol from Angelica gigas inhibit the lung metastasis of murine colon carcinoma. Phytother. Res. 2011, 25, 959–1023. [Google Scholar] [CrossRef] [PubMed]
  5. Kim, M.R.; El-Aty, A.M.; Choi, J.H.; Lee, K.B.; Shim, J.H. Identification of volatile components in Angelica species using supercritical-CO2 fluid extraction and solid phase microextraction coupled to gas chromatography-mass spectrometry. Biomed. Chromatogr. 2006, 20, 1267–1273. [Google Scholar] [CrossRef] [PubMed]
  6. Lee, S.; Lee, Y.S.; Jung, S.H.; Shin, K.H.; Kim, B.K.; Kang, S.S. Anti-tumor activities of decursinol angelate and decursin from Angelica gigas. Arch. Pharm. Res. 2003, 26, 727–730. [Google Scholar] [CrossRef] [PubMed]
  7. Lee, S.; Lee, Y.S.; Jung, S.H.; Shin, K.H.; Kim, B.K.; Kang, S.S. Antioxidant activities of decursinol angelate and decursin from Angelica gigas roots. Nat. Prod. Sci. 2003, 9, 170–173. [Google Scholar]
  8. Lee, S.; Shin, D.S.; Kim, J.S.; Oh, K.B.; Kang, S.S. Antibacterial coumarins from Angelica gigas roots. Arch. Pharm. Res. 2003, 26, 449–452. [Google Scholar] [CrossRef] [PubMed]
  9. Chen, X.B.; Xie, Y.H.; Sun, X.M. Development and characterization of polymorphic genic-SSR markers in Larix kaempferi. Molecules 2015, 20, 6060–6067. [Google Scholar] [CrossRef] [PubMed]
  10. Paux, E.; Sourdille, P.; Mackay, I.; Feuillet, C. Sequence-based marker development in wheat: Advances and applications to breeding. Biotechnol. Adv. 2012, 30, 1071–1088. [Google Scholar] [CrossRef] [PubMed]
  11. Park, Y.J.; Lee, J.K.; Kim, N.S. Simple sequence repeat polymorphisms (SSRPs) for evaluation of molecular diversity and germplasm classification of minor crops. Molecules 2009, 14, 4546–4569. [Google Scholar] [CrossRef] [PubMed]
  12. Moose, S.P.; Mumm, R.H. Molecular plant breeding as the foundation for 21st century crop improvement. Plant Physiol. 2008, 147, 969–977. [Google Scholar] [CrossRef] [PubMed]
  13. Tautz, D.; Renz, M. Simple sequences are ubiquitous repetitive components of eukaryotic genomes. Nucleic Acids Res. 1984, 12, 4127–4138. [Google Scholar] [CrossRef] [PubMed]
  14. Thomson, M.J. High-throughput SNP genotyping to accelerate crop improvement. Plant Breed. Biotechnol. 2014, 2, 195–212. [Google Scholar] [CrossRef]
  15. Saha, M.C.; Cooper, J.D.; Mian, M.A.; Chekhovskiy, K.; May, G.D. Tall fescue genomic SSR markers: Development and transferability across multiple grass species. Theor. Appl. Genet. 2006, 113, 1449–1458. [Google Scholar] [CrossRef] [PubMed]
  16. Aggarwal, R.K.; Hendre, P.S.; Varshney, R.K.; Bhat, P.R.; Krishnakumar, V.; Singh, L. Identification, characterization and utilization of EST-derived genic microsatellite markers for genome analyses of coffee and related species. Theor. Appl. Genet. 2007, 114, 359–372. [Google Scholar] [CrossRef] [PubMed]
  17. Wang, Z.; Fang, B.; Chen, J.; Zhang, X.; Luo, Z.; Huang, L.; Chen, X.; Li, Y. De novo assembly and characterization of root transcriptome using Illumina paired-end sequencing and development of cSSR markers in sweet potato (Ipomoea batatas). BMC Genom. 2010, 11, 726–739. [Google Scholar] [CrossRef] [PubMed]
  18. Liu, G.S.; Zhang, Y.G.; Tao, R.; Fang, J.G.; Dai, H.Y. Identification of apple cultivars on the basis of simple sequence repeat markers. Genet. Mol. Res. 2014, 13, 7377–7384. [Google Scholar] [CrossRef] [PubMed]
  19. Varshney, R.K.; Nayak, S.N.; May, G.D.; Jackson, S.A. Next-generation sequencing technologies and their implications for crop genetics and breeding. Trends Biotechnol. 2009, 27, 522–530. [Google Scholar] [CrossRef] [PubMed]
  20. Choi, S.A.; Kim, Y.J.; Kim, K.Y.; Kim, J.H.; Seong, R.S. The complete chloroplast genome sequence of the medicinal plant, Angelica gigas (Apiaceae). Mitochondrial DNA Part B 2016, 1, 307–308. [Google Scholar] [CrossRef]
  21. Kumar, P.; Gupta, V.K.; Misra, A.K.; Modi, D.R.; Pandey, B.K. Potential of molecular markers in plant biotechnology. Plant Omics 2009, 2, 141–162. [Google Scholar]
  22. Luo, R.; Liu, B.; Xie, Y.; Li, Z.; Huang, W.; Yuan, J.; He, G.; Chen, Y.; Pan, Q.; Liu, Y.; et al. SOAPdenovo2: An empirically improved memory-efficient short-read de novo assembler. GigaScience 2012, 1, 18. [Google Scholar] [CrossRef] [PubMed]
  23. Smit, A.F.A.; Hubley, R.; Green, P. RepeatMasker Open-4.0. Available online: http://www.repeatmasker.org (accessed on 15 September 2015).
  24. Stanke, M.; Steinkamp, R.; Waack, S.; Waack, B. AUGUSTUS: A web server for gene finding in eukaryotes. Nucleic Acids Res. 2004, 32, 309–312. [Google Scholar] [CrossRef] [PubMed]
  25. Jones, P.; Binns, D.; Chang, H.-Y.; Fraser, M.; Li, W.; McAnulla, C.; McWilliam, H.; Maslen, J.; Mitchell, A.; Nuka, G.; et al. InterProScan 5: Genome-scale protein function classification. Bioinformatics 2014, 30, 1236–1240. [Google Scholar] [CrossRef] [PubMed]
  26. Temnykh, S.; DeClerck, G.; Lukashova, A.; Lipovich, L.; Cartinhour, S.; McCouch, S. Computational and experimental analysis of microsatellites in rice (Oryza sativa L.): frequency, length variation, transposon associations, and genetic marker potential. Genome research 2001, 11, 1441–1452. [Google Scholar] [CrossRef] [PubMed]
  27. Rozen, S.; Skaletsky, H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol. Biol. 2000, 132, 365–386. [Google Scholar] [PubMed]
  28. Liu, K.; Muse, S.V. PowerMarker: An integrated analysis environment for genetic marker analysis. Bioinformatics 2005, 21, 2128–2129. [Google Scholar] [CrossRef] [PubMed]
  29. Kumar, S.; Tamura, K.; Nei, M. MEGA: molecular evolutionary genetics analysis software for microcomputers. Bioinformatics 1994, 10, 189–191. [Google Scholar] [CrossRef]
  30. Cavagnaro, P.F.; Senalik, D.A.; Yang, L.; Simon, P.W.; Harkins, T.T.; Kodria, C.D.; Huang, S.; Weng, Y. Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.). BMC Genom. 2010, 11, 569–586. [Google Scholar] [CrossRef] [PubMed]
  31. Zhu, H.; Senalik, D.; Mccown, B.H.; Zeldin, E.L.; Speers, J.; Hyman, J.; Bassil, N.; Hummer, K.; Simon, P.W.; Zalapa, J.E. Mining and validation of pyrosequenced simple sequence repeats (SSRs) from American cranberry (Vaccinium macrocarpon Ait.). Theor. Appl. Genet. 2011, 124, 87–96. [Google Scholar] [CrossRef] [PubMed]
  32. Shendure, J.; Ji, H. Next-generation DNA sequencing. Nat. Biotechnol. 2008, 26, 1135–1145. [Google Scholar] [CrossRef] [PubMed]
  33. Ekblom, R.; Galindo, J. Applications of next generation sequencing in molecular ecology of non-model organisms. Heredity 2011, 107, 1–15. [Google Scholar] [CrossRef] [PubMed]
  34. Zalapa, J.E.; Cuevas, H.; Zhu, H.; Steffan, S.; Senalik, D.; Zeldin, E.; Mccown, B.; Harbut, R.; Simon, P. Using next-generation sequencing approaches to isolate simple sequence repeat (SSR) loci in the plant. Am. J. Bot. 2012, 99, 193–208. [Google Scholar] [CrossRef] [PubMed]
  35. Yang, L.; Wen, C.; Zhao, H.; Liu, Q.; Yang, J.; Liu, L.; Wang, Y. Development of polymorphic genic SSR markers by transcriptome sequencing in the welsh onion (Allium fistulosum L.). Appl. Sci. 2015, 5, 1050–1063. [Google Scholar] [CrossRef]
  36. Wei, X.; Wang, L.; Zhang, Y.; Qi, X.; Wang, X.; Ding, X.; Zhang, J.; Zhang, X. Development of simple sequence repeat (SSR) markers of sesame (Sesamum indicum) from a genome survey. Molecules 2014, 19, 5150–5162. [Google Scholar] [CrossRef] [PubMed]
  37. Lu, Y.; Cheng, T.; Zhu, T.; Jiang, D.; Zhou, S.; Jin, L.; Yuan, Q.; Huang, L. Isolation and characterization of 18 polymorphic microsatellite markers for the “Female Ginseng” Angelica sinensis (Apiaceae) and cross-species amplification. Biochem. Syst. Ecol. 2015, 61, 488–492. [Google Scholar] [CrossRef]
Table 1. Genome sequencing of five Angelica gigas accessions.
Table 1. Genome sequencing of five Angelica gigas accessions.
MVHVGLVJLVSLV
Total Reads 351,885,410 153,890,676 112,354,576 139,011,848 142,791,294
Total Bases35,540,426,41015,542,958,276 11,347,812,17614,040,196,64814,421,920,694
Number of Scaffolds395,007541,229468,810526,114546,442
Assembled genome size (bp)804,583,850298,362,936242,460,776273,337,815298,605,080
MV, Manchu variety; HV, Hwangje variety; GLV, Gangwon local variety; JLV, Jecheon local variety; SLV, Sancheong local variety.
Table 2. Characterization of 36 polymorphic single sequence repeat (SSR) markers validated in 16 A. gigas accessions.
Table 2. Characterization of 36 polymorphic single sequence repeat (SSR) markers validated in 16 A. gigas accessions.
MarkerGenBank No.MotifPrimer SequenceAnnealing Temperature (°C)Allele Size (bp)
YL-AGN tri0110KX138563(AAT) 5F:FAM-TATGTCGGCAACACATGC58177–186
R:GTTTGGCTAGGCTAGACTAATGTGGGT
YL-AGN tri0221KX138564(ACC) 7F:FAM-CGACACCAGTGTTGATCCAT58179–194
R:GTTTGTATCGAGTCCATAACGCTCAG
YL-AGN tri0303KX138565(AGC) 5F:FAM-AGGCGATAGAATCCCCAA58157–175
R:GTTTCGAGAAACAAAGCTTCAGGG
YL-AGN tri0336KX138566(AGG) 8F:FAM-GTGGTGTTCTTTTCTCCACG58185–203
R:GTTTCTTGTCGTCTTCTCGTCTCTCT
YL-AGN tri0359KX138567(AGT) 8F:FAM-CGTCGGCTAGTAACTGCAAA58188–194
R:GTTTATGTCGCGGTGATTTACG
YL-AGN tri0379KX138568(ATA) 15F:FAM-GCTGTTCTGGCGTATGTACTTC58161–191
R:GTTTACTCCCAACAAACACTGCTC
YL-AGN tri0537KX138569(ATG) 7F:FAM-GGTCCTCAGCTTTCTCAGAATC58175–217
R:GTTTCCTGATATCTCCCTGCAATC
YL-AGN tri0638KX138594(ATT) 7F:PET-AGCATAGGATCCAGCTTGTG58134–161
R:GTTTTTTCGAGATGGAGTCTCAGC
YL-AGN tri0685KX138570(CAC) 5F:FAM-AGTAAGCAAGTGATGCCGAG58179–197
R:GTTTGGGGGTTTTGTAGTGGTTCA
YL-AGN tri0832KX138571(CCA) 5F:FAM-CCACTTTACTCCCCTGATAAGC58163–184
R:GTTTTTCCTGCCAGCTCTGATTAC
YL-AGN tri0861KX138572(CCT) 5F:NED-CTGGTGGCTCTTAAGTTACTCG58193–230
R:GTTTCGGTTACTACAGTACGTTGGTTGC
YL-AGN tri0889KX138573(CCT) 5F:NED-GACAAAGAAGCAGTGGCATC58193–202
R:GTTTGAGATTACGACGAGCGAGAA
YL-AGN tri0953KX138595(CTG) 16F:PET-TGTTTGCACCAGCTCTCA58180–209
R:GTTTCCACCTCAAGGTTCAATGAC
YL-AGN tri0955KX138596(CTG) 9F:PET-TAGCCAAGACCAGCTCAATC58144–167
R:GTTTTACTGCTACAATGGCACACC
YL-AGN tri0957KX138574(CTG) 6F:NED-CAGTTCCGTTGTTCCAACTC58188–203
R:GTTTGAAAGCGAACGAGAGATGAG
YL-AGN tri1043KX138575(GAA) 8F:NED-GCTGGATTATCACCTTCACG58191–204
R:GTTTGAAGAGGTAATGTGGGGTGTAG
YL-AGN tri1092KX138597(GAA) 14F:PET-CTAGCTTTCCCATGTCTGAACC58143–201
R:GTTTCATCCATGCACCAATGTC
YL-AGN tri1102KX138576(GAC) 5F:NED-ACTCAAAAGGACAAGTCCCC58150–200
R:GTTTCCTTCCACTCTCTGGTTGTAGAC
YL-AGN tri1118KX138577(GAG) 5F:NED-AACTGATTGGGAGGAGTAGGAG58168–183
R:GTTTCCTGAAAGAGTACTAACACCCG
YL-AGN tri1174KX138578(GAT) 6F:NED-AGCAACTAGCTCACTCACAACC58196–210
R:GTTTAGACGAGTTAAGGGACTTGC
YL-AGN tri1211KX138579(GCA) 8F:NED-ATGCAACAATGTCTCGGC58162–180
R:GTTTCAACTTGGGTTCTGCCCTAT
YL-AGN tri1262KX138580(GCT) 7F:NED-GTTACATTGAGTCCTCGTAGGG58171–199
R:GTTTATCACGAACCAGAACCCA
YL-AGN tri1269KX138581(GCT) 5F:VIC-CCCTTGACACTCACTACTCTCA58202–217
R:GTTTTTCTCTCTGCTACCCTTAGAGC
YL-AGN tri1276KX138598(GCT) 5F:PET-GATTGCTGCTGCTGAGTTG58156–177
R:GTTTGAACTCTCCGATGGTGTTGTAG
YL-AGN tri1358KX138582(GTA) 6F:VIC-GACAGCCAGCCTTCTTCAT58182–194
R:GTTTCCCTACATTGCGTTGATCC
YL-AGN tri1582KX138583(TCA) 8F:VIC-GGTACGGGAATATGAGACAGG58183–206
R:GTTTGTGATCTTGCTGATGACGG
YL-AGN tri1708KX138584(TCT) 10F:VIC-GTGGTGCTGCCAATGTTT58172–241
R:GTTTGAAGATGTCGGCAGATCAGT
YL-AGN tri1849KX138585(TGG) 7F:VIC-GTCCAATGATTCTGCTGCTC60.5174–183
R:GTTTATATACTGGGAGTGTTGCGGAG
YL-AGN tri1920KX138586(TTA) 8F:VIC-TACAGTCGAGTTCTGGACACAC58179–200
R:GTTTCTTTCTCCGTGAACATGTCG
YL-AGN tetra2116KX138587(AGAT) 9F:VIC-GTCACTAAAACACGAGACTGCC58179–250
R:GTTTCCAGAACTCGTTCCGAATC
YL-AGN tetra2173KX138588(ATTT) 5F:VIC-AGGCATGCACTACTCCTCTATC58179–191
R:GTTTATCTCGCAGCTATGAGACTACC
YL-AGN tetra2208KX138589(GATA) 6F:VIC-GCTGGGTTTAGGTTTCTGGA60.5157–177
R:GTTTACCGCAAACACCTAGTACTCC
YL-AGN tetra2275KX138590(TGTA) 7F:PET-CATCATCTTGCAAGGTCCAC58164–188
R:GTTTTCTCCCAAACTGGTACTCTG
YL-AGN Hexa2350KX138591(AGAATC) 6F:PET-GCTAGCAATAGCAGGTTGAC58193–210
R:GTTTATAGACCTGGTTTCGGGC
YL-AGN Hexa2367KX138592(GATCTC) 5F:PET-ACGTGACCACCATATTGC58187–199
R:GTTTGTGGACACTGTTTCGTCACTG
YL-AGN Hexa2374KX138593(TCATGC) 5F:PET-GGCAGACGTTCTGGTTTTC58164–176
R:GTTTCCGTGAGTGGTAGGGAAATA
Table 3. Diversity statistics from initial primer screening in 16 A. gigas accessions.
Table 3. Diversity statistics from initial primer screening in 16 A. gigas accessions.
No.MarkerMAFNGNAHEHOPIC
1YL-AGNtri01100.47740.680.670.63
2YL-AGNtri02210.251160.800.630.77
3YL-AGNtri03030.50850.670.500.62
4YL-AGNtri03360.341150.770.500.73
5YL-AGNtri03590.57430.560.330.48
6YL-AGNtri03790.281070.780.440.75
7YL-AGNtri05370.53970.670.560.64
8YL-AGNtri06380.33950.750.400.71
9YL-AGNtri06850.441070.730.690.70
10YL-AGNtri08320.38870.760.190.73
11YL-AGNtri08610.319100.840.500.82
12YL-AGNtri08890.59430.530.190.44
13YL-AGNtri09530.3113100.820.750.81
14YL-AGNtri09550.44850.720.690.68
15YL-AGNtri09570.31950.750.440.71
16YL-AGNtri10430.63750.560.440.52
17YL-AGNtri10920.2215110.870.630.86
18YL-AGNtri11020.341180.790.440.77
19YL-AGNtri11180.34940.710.690.66
20YL-AGNtri11740.38950.710.310.66
21YL-AGNtri12110.411060.750.500.71
22YL-AGNtri12620.311080.800.880.77
23YL-AGNtri12690.50860.680.750.64
24YL-AGNtri12760.251380.840.560.82
25YL-AGNtri13580.47950.670.440.62
26YL-AGNtri15820.311180.820.750.80
27YL-AGNtri17080.1614140.900.500.89
28YL-AGNtri18490.63340.540.130.48
29YL-AGNtri19200.311380.810.750.79
30YL-AGNtri22750.41960.730.690.69
31YL-AGNtetra21160.2213140.880.750.87
32YL-AGNtetra21730.41840.690.750.63
33YL-AGNtetra22080.41750.670.250.61
34YL-AGNhexa23500.44740.640.560.57
35YL-AGNhexa23670.46640.640.380.57
36YL-AGNhexa23740.44630.630.440.56
Mean0.399.16.40.730.530.69
MAF, major allele frequency; NG, number of genotypes; NA, number of alleles; HE, expected heterozygosity; HO, observed heterozygosity; PIC, polymorphism information content.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Back to TopTop