Exploiting Illumina Sequencing for the Development of 95 Novel Polymorphic EST-SSR Markers in Common Vetch (Vicia sativa subsp. sativa)

The common vetch (Vicia sativa subsp. sativa), a self-pollinating and diploid species, is one of the most important annual legumes in the world due to its short growth period, high nutritional value, and multiple usages as hay, grain, silage, and green manure. The available simple sequence repeat (SSR) markers for common vetch, however, are insufficient to meet the developing demand for genetic and molecular research on this important species. Here, we aimed to develop and characterise several polymorphic EST-SSR markers from the vetch Illumina transcriptome. A total number of 1,071 potential EST-SSR markers were identified from 1025 unigenes whose lengths were greater than 1,000 bp, and 450 primer pairs were then designed and synthesized. Finally, 95 polymorphic primer pairs were developed for the 10 common vetch accessions, which included 50 individuals. Among the 95 EST-SSR markers, the number of alleles ranged from three to 13, and the polymorphism information content values ranged from 0.09 to 0.98. The observed heterozygosity values ranged from 0.00 to 1.00, and the expected heterozygosity values ranged from 0.11 to 0.98. These 95 EST-SSR markers developed from the vetch Illumina transcriptome could greatly promote the development of genetic and molecular breeding studies pertaining to in this species.


Introduction
The common vetch (Vicia sativa subsp. sativa) is an important forage legume crop that is commonly used as green manure, pasture, silage, and hay. The nutritional potential of its seeds is universally recognised, as they contain high levels of protein, starch, and oil [1,2]. The vetch also fixes atmospheric nitrogen through its symbiotic relationship with rhizobia. This species is a diploid with 2n = 2x = 12 chromosomes and a relatively small genome (2,205 Mb) compared with other Vicia species [3,4].
The development and use of codominant molecular markers has increased remarkably in the last decade. Codominant markers, which are locus-specific and multi-allelic, have applications in genetic diversity studies, cultivar identification, evolution, linkage mapping, QTL mapping, comparative genomics, and marker-assisted selection breeding. Traditional approaches based on probe hybridizaiton (containing repeated motifs) against genomic or cDNA libraries followed by DNA sequencing for the development of SSR markers are time-consuming and resource-intensive [5]. In the last few years, emphasis has shifted towards the development of SSR markers from the transcribed regions of the genome. Traditional approaches to the development of SSR markers are time-consuming and resourceintensive. There are two prominent advantages to using expressed transcripts, rather than genomic sequences, as molecular markers. First, expressed sequence tag (EST) derived markers are more likely embedded in functional gene sequences, which make them act as "functional genetic markers" for rapidly establishing marker-trait linkages and to identify genes quantitative trait loci (QTLs) for traits of agricultural importance in crop plants [6]. Therefore, EST-SSRs can provide opportunities for gene discovery and enhance the role of genetic markers by assaying variation in transcribed and knownfunction genes. Second, EST-SSR markers are likely to be more highly conserved and may be more transferable between closely related species compared with the markers derived from genomic sequences. Third, ESTs that show homology mapping and can be useful for aligning genome linkage maps across distantly related species for comparative analysis [7]. EST-SSR markers may also lead to direct gene tagging for QTL mapping of agronomically important traits, as well as increase the efficiency of marker-assisted selection [8]. Therefore, the development and characterisation of EST-SSR markers has become quite extensive in a wide range of plant species [1,[9][10][11][12][13][14][15][16].
Recently developed high throughput sequencing technologies (i.e., next-generation sequencing) performed on, for example, the Illumina Genome Analyser or the Roche/454 Genome Sequencer FLX Instrument, are powerful and cost-efficient tools for use on non-model organisms, especially in the development of EST-SSR markers [1,[9][10][11][12][13][14][15][16]. During the past several years, next-generation sequencing for non-model organisms was largely confined to the Roche/454 instrument due to its longer reads [1,8,9,14]; however, several recent studies have demonstrated the feasibility of both 454 and Illumina technologies for the isolation of SSRs or EST-SSRs [1,[9][10][11][12][13][14][15][16]. Due to Illumina's high coverage and low cost, it is widely used in transcriptome sequencing. Using 454 pyrosequencing technology, 65 and 49 polymorphic EST-SSR markers have been developed in V. sativa subsp. sativa and V. sativa subsp. nigra, respectively [1,9]. Compared with some other plants (e.g., peanut, for which 1,281 polymorphic EST-SSR markers are available [17]), this number is low, the existing number of EST-SSR markers for common vetch is insufficient to meet the developing demand for genetic and molecular research on this plant.
In our previous study, we sequenced common vetch transcriptomes using Illumina technologies with NCBI accession No. GSE35437 [13]. Here, 1071 potential EST-SSRs were identified from 1025 unigenes whose lengths were greater than 1000 bp. By exploiting the Illumina sequencing databases of common vetch, we aimed to develop and characterise 95 novel polymorphic EST-SSR markers to promote studies on molecular diversity and breeding programs of this species.

Results and Discussion
De novo assembly of the vetch transcriptome using Illumina paired-end technology produced 43,973,369 raw sequence reads. All high-quality reads were assembled using the Trinity program [12], yielding 44,582 unigenes. We screened EST-SSR loci only from unigenes with lengths greater than 1,000 bp. Overall, 1,071 unigenes containing EST-SSR loci were identified and analysed. As shown in Table 1, we found that the most highly represented repeat number of potential EST-SSR loci was five, which accounted for 63.59% (681), followed by six (23.90%; 256), and seven (7.00%; 75). Di-to penta-nucleotide motifs were further analysed to determine the number of repeat units. A total of 41 potential EST-SSRs contained more than 13 repeat units, and all of the motifs were tri-nucleotide repeats. EST-SSRs are one of the most popular marker systems, consisting of varying numbers of tandem repeated di-, tri-, or tetra-nucleotide DNA motifs. As indicated in Table 1, the tri-nucleotide repeats were the most abundant type (97.20%; 1041), followed by di-(2.15%; 23), penta-(0.56%; 6), and tetra-nucleotide (0.09%; 1) repeats. Previous researches have shown that SSR occurrence in coding regions seems to be limited by non-perturbation of the reading frame, and the tri-and hexa-nucleotide repeats are dominant in protein-coding exons of all taxa. Such dominance of triplets over other repeats in coding regions may be explained on the basis of the suppression of non-trimeric SSRs in coding regions, possibly caused by frameshift mutations [18,19]. The relative proportions of EST-SSR motif types observed in vetch were similar to those seen in other plant species, such as Ma bamboo (Dendrocalamus latiflorus) [11] and alfalfa (Medicago sativa) [12]. Table 1. Length distribution of EST-SSR markers based on the number of repeat units.

No. of Repeat Units
Di-Tri-Tetra-Penta-Total  (Table 2), were used for PCR amplification. Of the 450 primer pairs, 357 pairs were able to amplify PCR products from vetch genomic DNA, while 93 primer pairs failed to amplify PCR products at various annealing temperatures and Mg 2+ concentrations, possibly due to the amplification of genomic DNA, the location of the primer across splice sites, large introns, chimeric primer or poor-quality sequences, and were thus excluded from further analysis. Among the 357 successful primer pairs, 115 PCR products showed the expected sizes, and 124 primer pairs generated PCR products that were larger than expected, indicating the likely presence of an intron within the amplicons. Additionally, the other 118 primer pairs were smaller than expected, indicating either the occurrence of deletions within the genomic sequences, a lack of specificity, or the possibility of assembly errors [12]. Of the 115 primer pairs that amplified PCR products with the expected sizes, 20 PCR products presented only one band, which may be a result of either the primer design or the homozygosity of the loci in alfalfa germplasm. There were 95 PCR amplifications that consistently resulted in more than one band among the 10 vetch accessions and 50 individual plants, which could be attributed to the high diversity of these 95 loci in regards to the 50 individuals. To determine whether the 95 EST-SSR markers developed in this study were novel, we searched the primer sequences of molecular markers previously reported in vetch against the target regions selected to design EST-SSR primers [1,9]. The BLAST results indicated that our 95 EST-SSR markers in vetch had not been previously reported. Information regarding the 95 novel EST-SSR primers is shown in Table 3 and Table S1.
Different locations of EST-SSR repeats within gene sequences may have different putative functions, for example, SSR variations in the 5'-UTR could regulate gene transcription and translation; SSR variations in the 3'-UTR could cause transcription slippage and produce expanded mRNA; and SSR variations in the coding regions should be subjected to much stronger selective pressure than those in other regions [10]. In the present study, 93 EST-SSR variations were found in the coding regions, while 2 were found in genes not associated with known proteins (Table S1). Table 3. Characteristics of the 95 EST-SSR markers in vetch (Vicia sativa subsp. sativa).   Table 3. Cont.  Table 3. Cont.  Table 3. Cont. For the polymorphic 95 EST-SSR loci, the number of alleles (N A ) per locus varied widely among the markers (Table 3 and   The dendrogram showed that the 50 individual vetch plants fell into five distinct clusters (Figure 2). Accessions cluster 1 originated from West Asia, cluster 2 from Europe, cluster 3 from West Asia, cluster 4 from Africa, and cluster 5 from Asia and Europe. The individual plants of No. 40-45 a Greek accession were clustered with a Chinese vetch cultivar (No. 46-50), indicating that the two accessions may share some genetic background. Moreover, only individual plant No. 4 was not clustered with its own group, which may be a result of genetic variation within accessions. In a previous study, researchers constructed dendrograms of V. sativa subsp. sativa and V. sativa subsp. nigra, and found no clear relationship between the clustering pattern and geographical distance [1,9]. Our cluster result is similar to those reported in the two previous studies, suggesting that the use of a greater number of accessions from close geographical locations will be essential to verify our present conclusion in future studies.

Plant Material
The common vetch seeds of 10 accessions (Table 2) were selected from the United States Department of Agriculture National Plant Germplasm System (NPGS). The common vetch seeds of 10 accessions (Table 2) were selected from the National Plant Germplasm System (NPGS). Seedlings were grown in a greenhouse at Lanzhou University in Lanzhou, China for approximately 45 days under a 16 h light/8 h dark cycle at 22 °C. Five individual plants from each of the 10 vetch accessions were used for polymorphism investigations of the selected EST-SSR markers. Genomic DNA was extracted from young leaves according to an established cetyltrimethylammonium bromide (CTAB) method [20], and DNA quantity and quality were assessed using a NanoDrop ND1000 instrument (Thermo Scientific, Wilmington, DE, USA).

Detection of EST-SSR Markers and Primer Design
The Illumina sequencing data for 44,582 unigenes had been previously deposited in the NCBI Gene Expression Omnibus with accession No. GSE35437 [13]. EST-SSRs were detected in the 44,582 vetch unigenes using the Simple Sequence Repeat Identification Tool program [12]. Only unigenes longer than 1000 bp were included in the EST-SSR detection. The parameters were adjusted to identify perfect di-, tri-, tetra-, and penta-nucleotide motifs with a minimum of 6, 5, 5, and 5 repeats, respectively. The EST-SSR primers were designed using Batch Primer3 [12].
The following parameters were used for primer design: (1) primer length between 18 and 24 bp, with 20 bp as the optimum; (2) PCR product size from 100 to 250 bp; (3) annealing temperature from 50 to 60 °C and with an optimum annealing temperature of 55 °C; (4) GC content between 45% and 55%, with 50% as the optimum.

PCR Amplification and Diversity Analysis
PCR amplification was performed in a 10 μL reaction volume containing 50 ng of genomic DNA, 1× PCR buffer, 2.0 mM MgCl 2 , 2.5 mM dNTPs, 4.0 µM each primer, and 0.8 U Taq polymerase (TaKaRa, Kyoto, Japan). The cycling parameters were 94 °C for 3 min and 38 cycles of the following: 94 °C for 35 s, optimal annealing temperature ( Table 3) for 35 s, and 72 °C for 35 s, followed by a final extension at 72 °C for 7 min. The PCR products were subjected to electrophoresis on 8.0% non-denaturing polyacrylamide gels and then stained with ethidium bromide [12]. The band sizes were determined by comparison with the DL 500 DNA marker (TaKaRa, Kyoto, Japan). The indexes of H O , H E , and PIC were calculated as previously described [1]. Clusters analysis was performed to generate a dendrogram using the unweighted pair-group method with arithmetic mean (UPGMA) and Nei's unbiased genetic distance with NTSYSPC 2.0 software package [21].

Conclusions
By exploiting the Illumina sequencing database, we developed 95 novel EST-SSR markers, which were then successfully used to investigate the genetic diversity among 10 vetch accessions. To date, this represents the largest known number of vetch SSR markers developed in a single study. These 95 EST-SSR markers with relatively high degrees of polymorphism could be applied to a range of studies involving genetic diversity, cultivar identification, evolution, linkage mapping, QTL mapping, comparative genomics, and marker-assisted selection breeding of common vetch.