Development and Molecular Characterization of 55 Novel Polymorphic cDNA-SSR Markers in Faba Bean (Vicia faba L.) Using 454 Pyrosequencing

Faba bean (Vicia faba L.) is a major food source and fodder legume, popularly known for its high content of seed-protein. Its role is critical in crop rotation, and for fixing nitrogen effectively. Polymorphic simple sequence repeat markers from transcript sequences (cDNA; simple sequence repeat [SSR]) were developed for faba bean (Vicia faba). We found that 1,729 SSR loci from 81,333 individual sequence reads and 240 primer pairs were designed and synthesized. In total, 55 primer pairs were found to be polymorphic and scorable consistently when screened in 32 accessions. The number of alleles ranged from 2 to 15, frequency of major alleles per locus varied from 0.17 to 0.91, the genotypes number ranged from 2 to 17, observed and expected heterozycosity values ranged from 0.00 to 0.44 and 0.17 to 0.89 and overall PIC values ranged from 0.16 to 0.88 respectively. These markers will be a useful tool for assessing the genetic diversity, understanding the population structure, and breeding patterns of faba bean.


Introduction
Faba bean (Vicia faba L.) is currently the third most important winter season food legume globally. Faba bean represent an important source for dietary protein to human beings, edible oil and animal feed. Its

OPEN ACCESS
critical role in crop rotation, effective nitrogen fixation, soil improvement abilities, and contribution to reducing energy input costs have long been recognized. Faba bean is a diploid with 2n = 2x = 12 chromosomes, is partially cross pollinated (ranging from 4 to 84%) and possesses one of the largest genomes among crop legumes (~13,000 Mb) [1,2]. Despite being an alternative source of protein for human and effective in nitrogen fixation, the number of molecular markers available for faba bean is still scarce, with only 100 microsatellite (simple sequence repeat; SSR) markers [3,4] and only 32 EST-SSR markers [5,6] having been reported. The development of more reliable and informative molecular markers needs to be improved to enhance our understanding about the faba bean.
Next-generation transcriptome sequencing is an efficient means to generate superior resources for the development of cDNA-simple sequence repeat (SSR) markers. cDNA-SSR markers present some intrinsic advantages over genomic SSRs in their direct association with transcribed genes, low expense for development cost, and higher level of transferability to related species [7] and cDNA-SSRs, are highly polymorphic, rather than EST-derived SSR markers [8]. In a recent study, the authors sequenced faba bean transcriptomes using 454 pyrosequencing technologies and found that 1,729 SSR loci from 81,333 individual sequence reads and limited number of sequence 240 were submitted to GenBank, which paved the way for microsatellite marker development. In our study, we developed and characterized polymorphic cDNA-SSR markers based on these sequences for V. faba to facilitate the studies on molecular diversity of this species.

Results and Discussion
The V. faba transcriptome sequencing yielded 29.61 Mb and GS De Novo yielded 81,333 raw sequencing reads, based on the GS-FLX sequencer. SSR is one of the most popular marker systems, consisting of varying numbers of tandem repeated di-, tri-, or tetra-nucleotide DNA motifs. To identify SSR markers, we used the ARGOS program with default settings for the V. faba unigene collections. In total, 1,729 potential SSR motifs were identified, and the majority belonged to trinucleotide (67.61%) and dinucleotide (19.08%) repeats. All other types of SSRs such as tetra-, penta-, and hexa-nucleotide motifs were relatively low (13.3%), and the majority of trinucleotide SSRs had the GAA/AAG/AGA motif, followed by those with the TGG/CGT/GGT motif, and others with the CTT/TTC/TCT motif. The GA/AG, AT/TA, and GT/TG motifs were identified among the dinucleotide cDNA-SSRs. The relative proportion of SSR motif types in faba bean [9] were observed as similar to that of other plant species [10][11][12].
Among the identified SSR loci, we selected 240 sequences that were deposited in GenBank (GenBank accession number: KC218573-KC218812). Of the 240 primer pairs, only 55 primer pairs produced consistently amplified (Table 1). These 55 cDNA-SSR loci were screened in 32 accessions. The number of alleles (N A ) per locus varied widely among the markers (Table 2), ranged from 2 to 15, with an average of 6.0 alleles. The frequency of major alleles (M AF ) per locus varied from 0.17 to 0.91 with an average of 0.563, the genotypes number (N G ) ranged from 2 to 17, with an average of 6.3. The H O values were ranged from 0.00 to 0.44 with an average of 0.074, the H E values were ranged from 0.17 to 0.89 with an average of 0.587 and overall PIC values ranged from 0.16 to 0.88, with an average of 0.550 respectively. Similar observation was also found in the Vicia faba [6]. These cDNA-SSR markers were developed in our study are found to be a useful tools for further studies on molecular diversity and population structure of faba bean.

Plant Material
Faba bean seeds were selected from the National Agrobiodiversity Center, Rural Development Administration, Suwon, Korea. Seedlings were germinated and grown in a glasshouse. The leaves of young seedlings were used to extract the mRNA required to synthesize the cDNA library and for 454 sequencing.

cDNA Preparation
Total RNA was extracted from V. faba leaves that were frozen in liquid nitrogen, ground well into a powder, and then extracted using an RNeasy Plant Mini kit (Qiagen, Valencia, CA, USA) following the manufacturer's instructions. The integrity of total RNA was determined using a BIOSPEC-NANO spectrophotometer (Shimadzu, Kyoto, Japan) and agarose gel electrophoresis. mRNA was purified using the PolyATract mRNA Isolation System IV (Promega, Madison, WI, USA), and the purified products were used to synthesize full-length cDNAs using a ZAP-cDNA Synthesis kit (Stratagene, Santa Clara, CA, USA). Finally, cDNA was fragmented by nebulization for library construction.

Library Preparation
Approximately 1 µg of cDNA was used to generate a DNA library to use with the rapid library preparation method manual (Roche Life Science Inc., Branford, CT, USA). The cDNA fragment ends were polished (blunted), and two short adapters were ligated to both ends according to standard procedures described previously. The adapters provided priming of the sequences for both amplification and sequencing of the sample library fragments, as well as the sequencing key, a short sequence of four nucleotides used by the system's software for base calling. Following repair of any nicks in the double-stranded library, the unbound strand of each fragment was released (with 5-Adaptor A). Finally, the quality of this single-stranded template DNA library was assessed using a 2100 BioAnalyzer (Agilent, Waldbronn, Germany). The library was quantified to determine the optimal amount of the library needed as input for emulsion-based clonal amplification.

454 Pyrosequencing
Single effective copies of template species from the DNA library to be sequenced were hybridized to DNA capture beads. The immobilized library was then resuspended in an amplification solution, and the mixture was emulsified, followed by polymerase chain reaction (PCR) amplification. The DNA carrying beads were recovered from the emulsion and enriched after amplification. The second strands of the amplified products were melted, leaving the amplified single-stranded DNA library bound to the beads. The sequencing primer was then annealed to the immobilized amplified DNA templates. After amplification, a single DNA carrying bead was placed into each well of a PicoTiterPlate (PTP) device. Simultaneous sequencing with multiple samples on a single PTP (four region gasket) was used. The PTP was then inserted into the FLX Genome Titanium sequencer for pyrosequencing [13,14], and sequencing reagent was sequentially flowed over the plate. Information from the PTP wells was captured simultaneously by a camera, and the images were processed in real-time by an onboard computer. Multiplex identifiers were used to specifically tag unique samples in a GS FLX Titanium sequencing run, which were recognized by the GS data analysis software after the sequencing run and provided high confidence for assigning individual sequencing reads to the correct sample. Sequence assembly was performed after sequencing using GS De Novo Assembler software (Roche) to produce contigs and singletons. All sequence data were conformed to references using GS Reference Mapper software (Roche).

Discovery of cDNA-SSR Markers
All contigs and singletons from both transcriptomes were then used to mine SSR motifs, and the SSR motifs were identified using the ARGOS pipeline program (version 1.46) at the default settings to survey the molecular markers present in the V. faba accessions [15]. Parameters were designed for identifying perfect di-, tri-, tetra-, penta-, and hexa-nucleotide motifs with a minimum of six repeats. Primer design parameters were set as follows: length range, 18-23 nucleotides with 21 as optimum; PCR product size range, 100-400 bp; optimum annealing temperature, 55 °C; and GC content 40-60%, with 50% as optimum. Faba bean genomic DNA was extracted from 18 diverse faba bean accession samples for EST-SSR marker validation using a DNeasy® Plant Mini kit (Qiagen, Valencia, CA, USA), according to the manufacturer's instructions. Fresh leaf tissue from each accession was used for each extraction and ground well using liquid nitrogen. DNA was resuspended in 100 μL water, and dilutions were made to 10 ng/μL followed by storage at either −20°C or −80°C. Randomly selected EST-SSR primer pairs were validated experimentally, and forward primers were synthesized by adding the M13 sequence to enable fluorescent tail addition through the PCR amplification process [16]. PCR conditions included a hot-start at 95 °C for 10 min, followed by 10 cycles at 94 °C for 30 s, 60-50 °C for 30 s and 72 °C for 30 s, followed by 25 cycles at 94 °C for 30 s, 50 °C for 30 s, and 72 °C for 30 s, and a final elongation step of 72 °C for 10 min. PCR products were separated and visualized using the QIAxcel Gel Electrophoresis System (Qiagen).

Data Analysis
These 55 SSR loci were screened in 32 accessions ( Table 3). The number of alleles (N A ), major allele frequency (M AF ), observed heterozygosity (H O ), Expected heterozygosity (H E ), number of genotype (N G ), and polymorphism information content (PIC) were calculated using GenAlEx (version 6.5) [17].

Conclusions
In our study we have developed 55 cDNA-SSR markers, and they were successfully used to investigate the genetic diversity among 32 accessions of faba bean. However, there seems to be a relatively higher genetic diversity within V. faba, as only 55 of 240 cDNA-SSR loci exhibited polymorphism. The availability of co-dominant polymorphic cDNA-SSR markers provided a tool set for further study on molecular diversity, and will greatly facilitated the genetic structure studies of V. faba populations, the identification and conservation of faba bean.