Development and Evaluation of a Novel Set of EST-SSR Markers Based on Transcriptome Sequences of Black Locust (Robinia pseudoacacia L.)

Black locust (Robinia pseudoacacia L. of the family Fabaceae) is an ecologically and economically important deciduous tree. However, few genomic resources are available for this forest species, and few effective expressed sequence tag-derived simple sequence repeat (EST-SSR) markers have been developed to date. In this study, paired-end sequencing was used to sequence transcriptomes of R. pseudoacacia by the Illumina HiSeq TM2000 platform, and EST-SSR loci were identified by de novo assembly. Furthermore, a total of 1697 primer pairs were successfully designed, from which 286 primers met the selection screening criteria; 94 pairs were randomly selected and tested for validation using polymerase chain reaction amplification. Forty-five primers were verified as polymorphic, with clear bands. The polymorphism information content values were 0.033–0.765, the number of alleles per locus ranged from 2 to 10, and the observed and expected heterozygosities were 0.000–0.931 and 0.035–0.810, respectively, indicating a high level of informativeness. Subsequently, 45 polymorphic EST-SSR loci were tested for amplification efficiency, using the verified primers, in an additional nine species of Leguminosae, 23 loci were amplified in more than three species, of which two loci were amplified successfully in all species. These EST-SSR markers provide a valuable tool for investigating the genetic diversity and population structure of R. pseudoacacia, constructing a DNA fingerprint database, performing quantitative trait locus mapping, and preserving genetic information.


Introduction
Black locust (Robinia pseudoacacia L.), a model species of Robinia, is a deciduous forest tree that has significant economic and ecologic value in China, due to its rapid growth, excellent drought resistance, and good adaptability to the local environment [1,2]. It originated in the southeast region of North America and was first introduced to Nanjing in 1877-1878. The black locust was planted widely in China at the end of the 19th century, and it developed successfully as an exotic species [3,4]. Currently, R. pseudoacacia is distributed widely in China, particularly in regions north of the Yangtze River, and plays an essential role in afforestation and environmental improvement. Therefore, successful management and development of this resource requires understanding of the genetic diversity and population structure of natural populations, to maximize the selection, conservation, and utilization of elite germplasms [5]. However, the limited number of efficient molecular markers, particularly codominant markers such as simple sequence repeats (SSRs), greatly obstructs its use in breeding studies.
Microsatellite markers (or SSRs) are prevalent molecular markers used for studying the population genetics of plants and animals [6][7][8]. SSR markers are superior to other traditional DNA-based molecular markers (i.e., restriction fragment length polymorphism (RFLP), random amplification of polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), sequence-related amplified polymorphism (SRAP), and inter simple sequence repeats (ISSR)) [9][10][11][12][13][14] as they display codominance, high information content, and locus specificity. With the development of high-throughput sequencing technologies, a large number of single nucleotide polymorphism (SNP) markers have been developed in recent years [15]. However, SSR markers remain a powerful tool in studying genetic diversity, determining population structure, constructing DNA fingerprint databases, generating genetic maps, and predicting molecular marker-assisted breeding, due to their good reproducibility and cost-effectiveness [16]. SSR markers applied to black locust are rare; Lian and Hogetsu [17] isolated seven polymorphic microsatellite loci using a dual-suppression-PCR technique, and Mishima et al. [18] isolated 11 microsatellite loci using an enrichment method [19] with some modifications. Such limited SSR marker development was based on genomic DNA levels and not expressed sequence tags (ESTs). Genomic DNA-derived SSRs (G-SSRs), and EST-derived SSRs (EST-SSRs), are two types of SSR markers. Abundant EST sequences and molecular markers can be generated by transcriptome sequencing, which involves constructing a complementary DNA (cDNA) library, followed by sequencing using the Illumina HiSeq sequencing platform [20,21]. EST-SSRs have an advantage over G-SSR markers in that EST-SSRs are derived from the coding regions of genomes [22][23][24][25][26]; therefore, they demonstrate significant amplification efficiency and reveal conserved sequences among related species [27,28]. Although EST-SSRs are widely developed and have been applied to numerous species, reports on EST-SSR markers for black locust using transcriptome data are scarce, greatly limiting research on genetic variation, germplasm preservation, and molecular breeding in this species.
In the present study, we developed EST-SSRs from the transcriptome sequencing of R. pseudoacacia by carefully selecting 45 EST-SSRs primer pairs with high-resolution and high polymorphism, to assess the genetic diversity and population structure. Moreover, this technique facilitated the construction of a DNA fingerprint database of black locust, providing tools for understanding the genetic variation, developing appropriate germplasm conservation strategies, and laying the foundation for future molecular genetic research.

Unigene Acquisition
The unigenes used to develop SSR makers in this study came from the transcriptome sequencing data acquired by Wang et al. [29] (accession number: PRJNA260115), and are listed in Supplementary Materials S1.

EST-SSR Detection and Primer Design
Potential microsatellite repeats were detected from 36,533 unigenes using the MIcroSAtellite perl script [29]. The SSR motifs were searched for mono-, di-, tri-, tetra-, penta-and hexa-nucleotides, with a minimum number of repeat units of 12, 6, 5, 5, 4, and 4, respectively. Mono-nucleotide repeats were removed from the analysis, because the sequencing itself generates stretch errors [30]. An online software, BatchPrimer3, was used to design SSR-specific primers [31]. The major parameters for primer pair design were as follows: a minimum number of SSR pattern repeats of 10 for di-nucleotides, seven for tri-nucleotides, four for tetra-nucleotides, four for penta-nucleotides and three for hexa-nucleotides; minimum and maximum product sizes of 100-500 bp (optimal: 150 bp); primer length of 18-25 bases (optimal: 21 bases); GC content of 30-70% (optimal: 50%); annealing temperatures of 50-60 • C (optimal: 56 • C); and default values for the other parameters.

EST-SSR Identification and Validation
For polymorphism analyses of the EST-SSRs, 32 individuals of R. pseudoacacia (Table S1)  PCR technology was used to validate the quantity of synthetic primer pairs in a volume of 20 µL, which contained 2 µL genomic DNA (20 ng/µL), 10 µL of 2x TSINGKE ® Master Mix (blue) (Beijing TsingKe Biotech Co., Ltd., Beijing, China), 4 µL of M13 primer (1 µM; 5 -TGTAAAACGACGGCCAGT-3 ) labeled at the 5 end of the forward primer with fluorescent-dye (FAM, HEX, ROX and TAMRA), using a technique that could easily identify the four fluorescent labels [32], 0.8 µL of the forward primer (1 µM), and 3.2 µL of the reverse primer (1 µM). PCR conditions were as follows: denaturation at 94 • C for 4 min followed by 28 cycles at 94 • C for 30 s, 55-59 • C for 30 s (optimal annealing temperatures are given in Table 1), and 72 • C for 1 min, followed by 10 cycles at 94 • C for 30 s, 50 • C for 30 s, 72 • C for 45 s, and a final extension at 72 • C for 10 min using a BIO-RAD T100 thermal cycler [32,33]. PCR products were subjected to analysis using an ABI 3730XL DNA Sequencer (Applied Biosystems, Foster City, CA, USA). The alleles of SSRs were confirmed using the GeneMarker version 2.2.0 software package (SoftGenetics LLC, State College, PA, USA). POPGENE version 1.32 software [34] was used to evaluate population genetic parameters including the observed number of alleles (Na), effective number of alleles (Ne), Shannon's Information index (I), observed heterozygosity (Ho), expected heterozygosity (He), Polymorphism information content (PIC) and Hardy-Weinberg equilibrium (HWE) [34]. PIC was calculated using PIC-CALC version 0.6 [35] and the null allele frequency (Null freq.) using Cervus 3.0 software [36].

EST-SSR Amplification in Related Species
EST-SSR markers are often quite well-conserved among congeneric species. Six genera of Leguminosae from Beijing Forestry University Campus, China (40 • 0 22 N, 116 • 21 1 E) were used to evaluate the potential value of the developed set of 45 EST-SSR markers in other related species, including Gleditsia (Caesalpinioideae), Cercis (Caesalpinioideae), Wisteria (Papilionoideae), Trifolium (Papilionoideae), Amorpha (Papilionoideae), Sophora (Papilionoideae), and Robinia (Papilionoideae). A total of nine species were selected in the genera Sophora, including Sophora japonica and Sophora japonica var. pendula, and Robinia, including R. pseudoacacia 'Frisia' and R. pseudoacacia var. decaisneana. The genomic DNA extraction, PCR amplification, and fluorescence modification, were performed as described above, except that the annealing temperature was re-optimized for each locus.

Results and Discussion
A total of 5072 potential EST-SSR loci were identified from 36,533 unigenes, the tri-nucleotides were the most abundant type of repeat (2321, 45.761%), followed by di-nucleotide repeats (1889, 37.244%), tetra-nucleotide repeats (270, 5.323%), and the penta-and hexa-nucleotide repeats at approximately equal frequencies (Table 2). In total, 2486 primer pairs were successfully designed from 4781 potential EST-SSR loci, of which 789 primers targeted the same unigenes, and we obtained 1697 SSR primer pairs targeting unique unigenes. These primers met a series of rigorous screening criteria described in the previous section. Subsequently, 286 primers were successfully selected, of which 94 pairs were randomly selected and synthesized by Sangon Biotech (Shanghai, China) for validation. Ultimately, these 94 primer pairs were used to evaluate whether the potential EST-SSRs were polymorphic and informative for use in germplasm conservation and population genetics. This set of 94 primer pairs targeting EST-SSRs was further filtered by analyzing a group of germplasms containing 32 black locust individuals collected from four different regions in China.
Detailed characteristics, primer sequences, and genetic information regarding the novel set of 45 polymorphic EST-SSR markers are presented in Table 1 Here, there are three cases that explain the null allele frequency: (1) When the microsatellite flanking sequence is mutated, one (or both) primer(s) do not bind to their target site at a particular allele, resulting in the locus failing to amplify; (2) When the difference in the size of the alleles is greater than 150 bp, the advantage of amplifying the smaller allele is significantly larger than that of the large fragment-a few samples show only the small fragment allele, resulting in site band deletion or an excess of homozygous individuals; (3) If one of the two alleles in the heterozygous state of a given locus is a null allele, only a single band appears after PCR, and the locus is therefore mistakenly interpreted as being homozygous. Therefore, a high incidence of null alleles always causes an excess of homozygotes (i.e., heterozygote deficiency).
SSR markers of R. pseudoacacia have been developed and used in other related species [17,18]. The Na ranged from 4 to 12, with an average of 8.2; Ho and He ranged from 0.333 to 0.821 and 0.489 to 0.867, respectively (based on 11 SSR markers from 39 individuals) [18]. In the red clover (Trifolium pretense L.), the Na ranged from 2 to 25, and the average Ho and He values were 0.71 and 0.88, respectively (based on 27 SSR markers from 24 individuals) [35]. In Pisum sativum (Leguminosae), the Na ranged from 1 to 7, and the Ho and He ranged from 0 to 0.889 and 0 to 0.840, respectively (based on 41 SSR markers from 32 individuals) [36].
In summary, these 45 primer pairs targeting SSRs identified abundant polymorphisms that can be used to evaluate the genetic diversity and population structure of this species, and to provide a practical strategy for selecting elite germplasms for conservation and utilization. Furthermore, we report the development, synthesis, and verification of SSR markers using Illumina paired-end sequencing of R. pseudoacacia. Using this set of EST-SSR markers, additional research can be implemented to investigate the relationships of inter-and intra-species construct genetic linkage maps and association maps of R. pseudoacacia.

Conclusions
A novel set of EST-SSR markers in black locust was successfully developed and characterized via transcriptome sequencing. The 45 SSR primer pairs displayed abundant polymorphisms that can be used to evaluate the genetic diversity and population structure of this species, construct a DNA fingerprint database, and provide a practical strategy for selecting elite germplasms for conservation and utilization.