Characterization and Application of EST-SSR Markers Developed from Transcriptome Sequences in Elymus breviaristatus (Poaceae: Triticeae)

Background: Elymus L. is the largest genus in the Triticeae tribe. Most species in this genus are highly stress resistant, with excellent forage value. Elymus breviaristatus, a rare species endemic to the Qinghai-Tibet Plateau (QTP), is declining due to habitat fragmentation. However, genetic data for E. breviaristatus are limited, with expressed sequence tag (EST) markers being particularly rare, hampering genetic studies and protection measures. Results: We obtained 9.06 Gb clean sequences from the transcriptome of E. breviaristatus, generating 171,522 unigenes, which were assembled and functionally annotated against five public databases. We identified 30,668 SSRs in the E. breviaristatus transcriptome, from which 103 EST-SSR primer pairs were randomly selected. Of these, 58 pairs of amplified products of the expected size, and 18 of the amplified products were polymorphic. Model-based Bayesian clustering, the unweighted pair group method with arithmetic average (UPGMA), and principal coordinate analysis (PCoA) of 179 wild E. breviaristatus in 12 populations using these EST-SSRs were generally consistent, grouping the 12 populations into two major clades. Analysis of molecular variance (AMOVA) found 70% of the genetic variation among the 12 populations and 30% within the populations, indicating a high level of genetic differentiation (or low gene exchange) among the 12 populations. The transferability of the 58 successful EST-SSR primers to 22 related hexaploid species was 86.2–98.3%. UPGMA analysis generally grouped species with similar genome types together. Conclusions: Here, we developed EST-SSR markers from the transcriptome of E. breviaristatus. The transferability of these markers was evaluated, and the genetic structure and diversity of E. breviaristatus were explored. Our results provide a basis for the conservation and management of this endangered species, and the obtained molecular markers represent valuable resources for the exploration of genetic relationships among species in the Elymus genus.


Introduction
Elymus sp. is an economically important genus in the grass tribe Triticeae in the family Poaceae, which is the largest genus in the Triticeae and contains more than 150 species worldwide [1]. Most species in this genus are widely distributed in temperate regions and have excellent resistance to disease, drought, and cold [2]. Therefore, Elymus represents an indispensable genetic resource for the improvement of stress tolerance in related crops and forage grass [3]. Elymus breviaristatus (Keng) Keng f. is a bunch-type, self-pollinating, short-lived, perennial forage grass with an StStHHYY genome (2n = 6x = 42) endemic to the Qinghai-Tibet Plateau (QTP) [4]. Because of its strong adaptability and high feeding value, E. breviaristatus is considered an excellent forage grass in the alpine region, which plays an important role in grassland husbandry and ecological restoration on the QTP [5]. However, due to increased human activity and habitat deterioration on the QTP, the wild populations of E. breviaristatus have been significantly reducing over the past years, and it has been listed as an important conserved wild plant in China (Class II) since 1999. Therefore, it is critical to accelerate the conservation and utilization of E. breviaristatus, particularly in revealing the distribution and genetic diversity of wild populations.
In recent decades, many types of molecular markers have been widely used in species diversity assessments, molecular-assisted breeding, DNA fingerprinting, and conservation biology [6][7][8]. Simple sequence repeats (SSRs) markers tend to be commonly used because of their high polymorphism, co-dominant inheritance, wide genome coverage, and good reproducibility [9][10][11]. In particular, expressed sequence tag SSRs (EST-SSRs) show more convenient development and higher transferability across related species than genomic SSRs, which have a wide range of applications in population genetics and breeding of crop and forage grass [12], especially in non-model species. However, owing to a lack of genetic and genomic information, there have been no investigations of EST-SSR markers in E. breviaristatus. Although previous studies have used the EST-SSRs of related Elymus species and SRAPs to analyze the genetic structure of E. breviaristatus [13,14], the number of markers was too small, and detection efficiencies remain to be improved. Thus, the more and higher efficient markers are needed for an in-depth understanding of the diversity of E. breviaristatus and to develop strategies for their sustainable utilization.
As next-generation sequencing techniques have become more advanced, several new methods of SSR marker development have been reported [15,16]. For example, de novo transcriptome sequencing is an advanced high-throughput sequencing technique that can generate transcriptome sequences containing large numbers of EST-SSRs within a short timeframe [17]. EST-SSRs developed from the transcriptional regions of the genome are evolutionarily conserved and have a high level of transferability to related species [18]. However, to date, the transcriptome of E. breviaristatus remains unavailable, which has hindered critical aspects of research into this species, such as protection management and the molecular mechanisms underlying its desirable traits. Here, we sequenced the transcriptome of E. breviaristatus using the Illumina Hiseq platform, and then assembled and annotated the sequence data, which was the first systematic report of the transcriptome of E. breviaristatus. Based on these data, we identified EST-SSR loci and developed primer pairs. The major objectives of this study were to develop efficient EST-SSR markers for genetic diversity analyses of the endangered plant E. breviaristatus and to assess the applicability of these markers to related species.

Plant Materials
The leaves used for RNA isolation and transcriptome sequencing were collected from a single individual of E. breviaristatus cv. Chuanxi growing in Hongyuan county, China (latitude, 32.78 • N; longitude, 102.54 • E; altitude, 3502 m). For the genetic diversity analysis, we collected 179 E. breviaristatus individuals from 12 locations on the Qinghai-Tibet Plateau ( Figure 1, Table S1). In each of the 12 locations, we collected samples from 14-15 individual plants that were a minimum of 5 m apart. In addition, a total of 23 related allohexaploid species of Elymus and Roegneria in Triticeae were used to test the cross-species transferability of the identified EST-SSRs (Table S2). All materials in the study were planted in the forage grass nursery of Southwest Minzu University (Hongyuan, Sichuan, China).

RNA Sequencing, Transcriptome Assembly, and Transcriptome Annotation
RNA was isolated using the Plant Total RNA Kit (Tiangen Biotech Co., Ltd., Beijing, China), following the manufacturer's instructions. RNA quality was verified using an Agilent Bioanalyzer 2100 system (Agilent Technologies, Palo Alto, CA, USA). RNA degradation and contamination were monitored on 1% agarose gels. Total RNA purity was checked using a NanoPhotometer spectrophotometer (Implen, Calabasas, CA, USA). Sequencing libraries were generated using a NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, San Diego, CA, USA), following the manufacturer's instructions, and were preliminarily quantified using a Qubit2.0 fluorometer (Life Technologies, Carlsbad, CA, USA). The insert size of the library was determined using the Agilent Bioanalyzer 2100. The library preparations were performed on the Illumina Hiseq 2500 platform.
The sequenced raw reads were cleaned by removing adapter sequences and lowquality reads. De novo transcriptome assembly of the clean reads was performed using Trinity [19]. The assembled unigene sequences were searched against several public databases, including Gene Ontology (GO) [20], Clusters of Orthologous Groups of proteins (COG) [21], euKaryotic Orthologous Groups (KOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) [22,23]. Unigenes were functionally annotated by first searching for homologous sequences in the NCBI non-redundant protein sequence (NR) database using BLAST (E-value < 10 -5 ) [24], and then searching the resulting sequence hits against the Protein family (Pfam) database using HMMER to obtain annotation information [25].

EST-SSR Marker Identification and Primer Design
SSRs were identified in the assembled unigene sequences using the MicroSatellite (MISA) identification tool [26]; the minimum numbers of repeats for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs were set to 10, 6, 5, 5, 5, and 5, respectively. The EST-SSR primers were designed using Primer 3 (http://primer3.sourceforge.net, accessed on 15 March 2020) based on the MISA results. From the designed EST-SSR primers, we randomly selected 103 pairs for synthesis using Tingke Biological Technology (Beijing, China).

DNA Extraction and EST-SSR Amplification
Fresh young leaves were collected from the two sets of plant materials described above and dried in zip-lock plastic bags with silica gel. Total genomic DNA was extracted using a DP350 Plant DNA kit (Tiangen Biotech Co., Ltd., Beijing, China), following the manufacturer's instructions. The quality and quantity of DNA were detected using 1% agarose gels and a NanoDrop-Lite spectrophotometer (Thermo Scientific, Waltham, MA, USA), respectively. DNA samples were diluted to a concentration of 10 ng/µL and stored at −20 • C.
PCR amplifications were performed in a 20-µL reaction volume containing 20 ng of genomic DNA, 0.5 µM of each primer, and 10 µL of 2×Es Taq MasterMix (Dye Plus) (CoWin Biosciences, Beijing, China). A PTC-200 Thermal Cycler (BIO-RAD, CA, United States) was used to run touch-down PCRs with the following cycling conditions: 1 cycle of 94 • C for 4 min; 5 cycles of 94 • C for 30 s, 60-65 • C for 30 s, and 72 • C for 60 s; 35 cycles of 94 • C for 30 s, 60 • C for 30 s, and 72 • C for 60 s; 1 cycle of 72 • C for 10 min; and an indefinite hold at 4 • C. The PCR products were separated using 6% non-denaturing polyacrylamide gel electrophoresis (PAGE). Products were loaded into the gel in 1×TBE (Tris-borate-EDTA) buffer and run at 400 V for 1.5 h, along with a 100 bp molecular size ladder (Tiangen Biotech Co., Ltd., Beijing, China). Bands were then visualized using silver staining.

Statistical Analyses
The EST-SSR profiles obtained for each sample were scored as present (1) or absent (0), and transformed into a raw data matrix. Using this data matrix, we calculated several genetic diversity parameters among the 12 populations with POPGENE1.32 [27]: percentage of polymorphic bands (PPB), Nei's genetic diversity (H), Shannon information index (I), observed number of alleles (Na), number of effective alleles (Ne), and Nei's genetic distance (GD). The PIC was calculated for each primer as "PIC" = 2f(1−f), where f is the amplified allele frequency and 1−f is the frequency of the null allele [28]. Matrixes showing Nei's pairwise genetic distances for 179 E. breviaristatus individuals and 23 related species were calculated using NTSYSpc 2.10e [29]. Dendrograms were constructed using the unweighted pair-group method with an arithmetic mean (UPGMA) algorithm in MEGA 6.0 based on Nei's genetic distances [30]. Principal coordinate analysis (PCoA) and analysis of molecular variance (AMOVA) of the EST-SSRs were performed using GenAlEx 6.5102 [31].
A model-based Bayesian clustering approach in STRUCTURE v.2.3.4 [32] was used to analyze the genetic structure across 179 individuals and to determine the most likely number of clusters (K). For each value of K from 2 to 12, 20 independent runs were performed. Each run comprised 50,000 Markov chain Monte Carlo (MCMC) replicates with an admixture. The first 10,000 replicates were discarded as burn-in. The optimum K value was estimated using Structure Harverster [33] based on Ln P (D) and delta K. Finally, the 10 replicates were clustered and aligned using CLUMPAK [34].

Illumina Sequencing and De Novo Transcriptome Assembly
A total of 61,494,104 paired-end raw sequencing reads were generated and submitted to NCBI's SRA under accession number SRR16939514, after filtering and strict quality checking, 60,407,780 clean reads (9.06 Gb) remained. Across the clean reads, the Q20 and Q30 ratios were over 96% and 91%, respectively. The GC content of the clean reads was 57.08%. A total of 250,182 transcripts were assembled, with an average length of 860 bp. These transcripts represented 171,522 unigenes, with a total length of 187,855,868 bp, an average length of 1095 bp, an N50 of 1409 bp, and an N90 value of 570 bp (Table 1).

Development and Validation of SSR Markers
From all SSR primer pairs, 103 pairs of primers were randomly selected and used to amplify SSRs from the DNA of 12 geographically distinct E. breviaristatus populations. In total, 58 primer pairs amplified PCR products of the expected size (Table S3), and 18 of these amplified SSRs were polymorphic (Table 3). We used the 18 polymorphic primer pairs identified above to investigate the relationships among wild populations and individuals of E. breviaristatus. This analysis generated 164 consistently well-amplified bands (an average of 9.1 bands per primer), among which 146 were polymorphic. The polymorphic information content (PIC) for the 18 primers was 0.267-0.500, with an average of 0.457. The mean of Nei's genetic diversity (H) and Shannon information index (I) ranged from 0.155-0.410 and 0.250-0.588, respectively ( Table 3). The UPGMA dendrogram for the 179 E. breviaristatus individuals recovered the samples in two major clades, with one clade including one subclade and the other clade including five subclades ( Figure 6A). Across the five subclades, all populations were monophyletic, except for population 10, which was polyphyletic (although still in the same subclade; Figure 6A).   We assessed several indexes of α diversity at the population level. Across all populations, the mean number of alleles (Na) was 5.611-12.000, the mean number of expected alleles (Ne) was 9.513-11.400, the Shannon information index (I) was 0.043-0.237, expected heterozygosity (He) was 0.028-0.153, and unbiased heterozygosity (uHe) was 0.028-0.159 (Table 4). In all indexes, diversity was lowest in Population 5 and highest in Population 1. Pairwise genetic distances ranged from 0.107 (between Population 1 and Population 2) to 0.462 (between Population 5 and Population 9). The UPGMA dendrogram for the 12 populations showed them divided into two groups, which showed that group I contained populations 1-8, and group II contained populations 9-12 ( Figure 6B). Finally, the analysis of molecular variance (AMOVA) of the EST-SSRs showed that 70% of the genetic variation existed among populations, while only 30% of the genetic variation existed within populations.

Population Genetic Structure of E. breviaristatus
Using a model-based Bayesian clustering approach, we analyzed the genetic structure among the E. breviaristatus individuals and found that ∆K was maximized at K = 2 ( Figure 7A). This indicated that the 179 individuals likely comprised two genetic sub-populations: one sub-population included populations 9-12, while the other subpopulation included populations 1-8 ( Figure 7B). At K = 6, which represented the next highest value of ∆K ( Figure 7B), six sub-populations were grouped as follows: Populations 1-3, Populations 4-5, Population 6-7, Population 8, Population 9, and Population 10-12 ( Figure 7B). These results were consistent with the population-level UPGMA dendrogram ( Figure 6B).
Principal coordinate analysis (PCoA) recovered the 12 populations in two major groups ( Figure 8); the first two principal coordinates explained 33.56% of the total genetic variance. These groups were consistent with those recovered by the clustering approach when K = 2 ( Figure 7B), and fairly consistent with those recovered by the population-level UPGMA analysis ( Figure 6B).

Transferability of EST-SSRs to Related Species
To evaluate the transferability of the EST-SSRs from E. breviaristatus, we performed cross-species amplification analysis of these 58 primer pairs in the Triticeae species with StHY, StStH, StHH, and StStY genomes. Transferability was greatest in E. purpuraristatus (57 successful EST-SSRs; 98.3%) and lowest in E. tangutorum (50 successful EST-SSRs; 86.2%). The overall transferability rate was 91.2%. Among the 23 species, a total of 391 reliable bands were generated, of which 369 bands (94.37%) were polymorphic. The Dice genetic similarities (GS) of 23 species ranged from 0.551 (E. tangutorum and E. repens)-0.915 (E. excelsus and E. purpuraristatus), which showed a high level of genetic variation among the species. The dendrogram constructed based on Dice GS values with UPGMA analysis obviously divided the 23 accessions into two groups (100% bootstrap support), which showed that the species with the same or similar genomes could be grouped together (Figure 9). Cluster I was comprised of E. patagonicus (StHH), E. scabriglumis (StHH), E. repens (StStH), E. transhyrcanus (StStH), and E. glaucissimus (StStY). Cluster II mainly contained all the StYH genome species except one StStY species, E. tschimganicus.

Characterization of SSRs in the Transcriptome
As next-generation sequencing has become increasingly accurate, rapid, and inexpensive, SSR markers have been developed for a number of species. At present, SSRs have been used to explore genetic diversity and population structure in several Elymus species [35][36][37]. However, to date, there are almost no available EST sequences for E. breviaristatus. Here, we provide the first report of the E. breviaristatus transcriptome, which was generated using next-generation sequencing. In total, 60,407,780 clean reads with a Q20 of 96.59% were obtained from 61,494,104 paired-end raw reads, and 171,522 unigenes were identified in the assembled transcriptome with a mean length of 1095 bp and an N50 of 1409 bp, which ensured the quality of sequencing. The N50 values of unigenes were consistent with the results reported in Glycyrrhiza uralensis (1395 bp) [38], radish (Raphanus sativus) (1256 bp) [39], and common bean (1449 bp) [40]. Overall, the transcriptome assembly data of E. breviaristatus generated in the present study will be valuable for use in subsequent analyses.
From the 25,834 SSR-containing sequences in the E. breviaristatus transcriptome, we identified 30,668 SSRs. The most common motifs were trinucleotide repeats (14,416 SSRs; 47.01%), mononucleotide repeats (9.096 SSRs; 29.66%), and dinucleotide repeats (6241 SSRs; 20.35%). Similar patterns of SSR motif abundance have been found in most other plant genomes, such as E. nutans [41] and alfalfa [42], showing that the trinucleotide repeat is the most common motif (64.27% and 48.80%, respectively). Previous studies have suggested that the most abundant trinucleotide repetition may be affected by harmful mutations in translated regions, resulting in the inhibition of the expansion or contraction of dinucleotide repeats in exons [43]. As shown in Table S4, among all types of sequence motifs identified in the E. breviaristatus, excluding the mononucleotide repeats, the CCG/CGG motif was the most common (5823 SSR), followed by AG/CT (3602 SSRs), AGG/CCT (2548 SSRs), and AGC/CTG (2258 SSRs). These results were consistent with patterns of sequence motifs found in the congeneric E. nutans [41] but differed from patterns of sequence motifs found in dicotyledonous species, such as Medicago truncatula [44], Ipomoea batatas [45], and Menispermum dauricum [46]. This reflects the difference between monocotyledons and dicotyledons. Previous studies showed that the CCG/CGG was the most abundant motif in monocotyledons but a rare repeat type in dicotyledons [47,48].
Of the 103 randomly selected SSR primer pairs, 58 successfully amplified fragments from E. breviaristatus DNA, and 18 of these fragments were polymorphic. This PCR success rate (56.31%) was higher than that reported for tree peony (47.30%) [49] and alfalfa (30%) [42]. However, the percentage of EST-SSR polymorphic markers in the present study was 17.48%, which did not reach the rate of the above species (39.90% and 29%). The differences in PCR success rates among these species might be due to differences in ploidy, while the level of polymorphism may be limited by the sample size and geographical origin of the materials used [50]. In general, the genetic data obtained in the study compensate for the lack of a reference genome for E. breviaristatus, and our transcriptome sequencing data will support novel gene discovery or the investigation of molecular mechanisms in this species and related taxa.

Genetic Diversity and Structure of E. breviarstatus Populations
Based on the amplification results of the 18 polymorphic EST-SSR primers, the percentage of polymorphic bands and the average PIC value were higher than those of a previous study of E. breviaristatus, which developed EST-SSR markers from E. wawawaiensis and E. lanceolatus [13], indicating that our primers might be more suitable for studies of genetic diversity and population structure. However, genetic diversity analyses of the 12 E. breviaristatus populations showed that the Shannon index (I) across the populations was 0.127, which is lower than that reported in other Elymus species, including E. sibiricus (0.297) and E. nutans (0.344) [15]. Previous studies have similarly shown that genetic diversity among and within E. breviaristatus populations is low [14]. This low level of population genetic diversity was unsurprising, as E. breviaristatus is a rare species endemic to QTP, and its genetic diversity may be restricted by geographical scope, environmental factors, and mating systems [51].
UPGMA cluster analysis, PCoA analysis, and Bayesian population structure analysis suggested that 179 E. breviaristatus accessions could be divided into two major clades, which revealed an obvious pattern of genetic differentiation between the major groups of E. breviaristatus populations. This consistency suggests that our results were reliable. Meanwhile, the AMOVA showed that genetic variation mainly occurred among populations rather than within populations. Similar results have been reported for several other self-pollinating Elymus species, including E. canadensis [52], E. fibrosus [53], and E. caninus [54]. Compared to the results of previous studies [13,14], the genetic variation within E. breviaristatus populations in our study area was extremely low, which may be due to the complex environment of the Qinghai-Tibet Plateau leading to multiple genetic differentiation mechanisms of the same species in different regions [55]. In general, our results will be useful for developing protection management plans for this species, such as in situ conservation strategies that limit grazing and reduce human disturbance to protect existing populations with high levels of genetic diversity [51,56], while ex situ conservation strategies can establish a diversified genetic resource bank of E. breviaristatus by collecting representative materials from populations with high genetic differentiation [13].

Transferability of EST-SSRs and Phylogenetic Relationships of Elymus Species
Due to the conservative nature of the flanking sequences of repeat loci, the SSR markers showed transferability in related species. Generally, the EST-SSRs have a higher transferability than other molecular markers on account of the conservation of transcribed regions among related species [12]. In the study, we also explored the transferability of the 58 successful EST-SSRs to 22 related species. In each of the related species, 86.2-98.3% of the primer pairs were successful, which showed higher transferability in the hexaploid species of Elymus and Roegneria. The transferability of these EST-SSRs was similar to the transferability ratios observed in E. wawawaiensis (96.08%) [35] but higher than that observed in E. sibiricus (49.11%) [57]. The different transferability rates may be influenced by the phylogenetic relationships between species. Usually, closely related species or species with high genomic similarity have high marker transferability [15]. In this study, the high transferability of these primers among other Elymus species indicates that these primers represent a valuable resource for the development of molecular markers for Elymus species.
As the largest genus in Triticeae, the taxonomy and biosystematics of Elymus species are extremely complex because of the huge morphological variation, the polyploid origin, and spontaneous hybridizations among the species [2], which is also one of the essential theoretical bases for their utilization. There has been a lot of work done on the phylogenetic relationships of tetraploid species with StH and StY genome constitution, which revealed the origin of the species and the reticulate evolutionary relationships. However, there are only a few studies on the biosystematic relationships of hexaploidy species, especially in StHY species. In this study, the phylogenetic relationships of the hexaploidy species with St, H, and Y genomes were revealed based on SSR data, which showed that the species with the same or similar genomes could be grouped together. The species with the StYH genome were clustered together and showed a relatively distant affinity with other StYY, StStY, StStH, and StHH species. Then, three different variation types were also revealed among the species with the StHY genome, which was also consistent with the morphology. These results indicate that the markers developed in this study would provide a powerful molecular tool for evolutionary adaptation and genetic relationship analysis in Elymus. In addition, the genetic information revealed by these primers can help to determine phylogenetic relationships in the Elymus, particularly if combined with further evidence, such as additional gene sequences and/or cytological [58] or morphological [59] characteristics.

Conclusions
In this study, we were the first to successfully develop 58 EST-SSR markers from the transcriptome of E. breviaristatus, 18 of which were polymorphic. Based on the amplification results of the polymorphic primers. The UPGMA, PCA, and structure analysis support that the genetic background of E. breviaristatus populations can be divided into two categories, which is accompanied by a low level of genetic diversity (mean I = 0.127) and genetic variation among populations was dominant (70%). These genetic characteristics can provide an important basis for further resource evaluation, collection, and conservation. In addition, 58 EST-SSR markers that can be successfully amplified have excellent transferability in related species (86.2% to 98.3%) and can classify most species according to their genome types through the amplification results, which provides valuable resources for exploring the genetic relationship among species in the Elymus genus.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes14020302/s1, Table S1: Geographic information of the 12 E. breviaristatus populations; Table S2: Details of the species used to detect the transferability of molecular markers in this study; Table S3: List of the 58 primer pairs with expected size of PCR products; Table S4: Detailed information of EST-SSRs based on the number of nucleotide repeat units.

Data Availability Statement:
We stored the raw sequence in the Sequence Read Archive of the National Center for Biotechnology Information (NCBI), and they are identified with the SRA under the accession number SRR16939514.

Conflicts of Interest:
The authors declare no conflict of interest.