Development of Polymorphic Genic Ssr Markers by Transcriptome Sequencing in the Welsh Onion (allium Fistulosum L.)

Transcriptome analysis is an efficient way to explore molecular markers in plant species, for which genome sequences have not been published. To address the limited number of markers published for the Welsh onion, this study found 6486 loci of genic simple sequence repeats (SSR), which consisted of 1–5 bp repeat motifs, based on next-generation sequencing (NGS) technology and the RNA-Seq approach. The most abundant motif was mononucleotide (52.33%), followed by trinucleotide (31.96%), and dinucleotide (14.57%). A total of 2525 primer pairs were successfully designed, and 91 out of 311 tested primers were polymorphisms. Overall, 38 genic SSR markers were randomly selected to further validate the degree of genetic diversity, and 22 genic SSR markers (57.89%) showed high levels of polymorphism. The average polymorphism information content (PIC) value and the number of alleles (Na) were 0.63 and 5.27, respectively, and the unweighted pair-group method with arithmetic average (UPGMA) cluster analysis grouped 1051 the 22 Allium accessions into three groups with Nei's similarity coefficients ranging from 0.37 to 0.99. This result suggested that these genic SSR markers could be used to develop a higher resolution genetic map and/or to analyze the phylogenetic relationships among Allium plants in the near future.


Introduction
The Welsh onion (Allium fistulosum L. 2n = 16) is a cultivar of Alliaceae Allium sp., which are some of the most commercially-important biennial and perennial herbs, and its plump stalk, including the leaf sheath and tender leaf, they are consumed as vegetables and condiments worldwide [1].The plant is thought to have been derived from Northwestern China, and it is widely cultivated worldwide, particularly in East Asian countries, such as Japan, China, and Korea [2,3].The Welsh onion is highly nutritious, with bactericidal and anti-inflammatory benefits, and it has been used as an herbal medicine to treat many diseases [4].Due to its great nutritional and medicinal value, the Welsh onion has become one of the main vegetables exported from China in recent years.Additionally, more and more researchers have been attempting to develop molecular markers for the Welsh onion to construct a genetic map or for use in a marker-assisted selection (MAS) breeding system.
A variety of molecular markers, such as random amplified polymorphic DNA (RAPD), amplified fragment length polymorphisms (AFLP), and sequence characterized amplified regions (SCAR), have been developed and used for molecular marker-assisted selection and genetic diversity analysis in the Welsh onion [5][6][7].In addition, microsatellites, or simple sequence repeats (SSRs), have also been applied to the construction of chromosome maps and cultivar identification [8][9][10][11].Compared with other molecular markers, SSR markers are more ideal due to their reproducibility, multi-allelic nature, co-dominant inheritance, relative abundance, and good genome coverage [12]; SSR loci are composed of randomly repeated DNA regions with motif lengths of one to six base pairs (bp) that are spread throughout the genome [13].In recent years, a large number of genomic SSR markers have been developed in the Welsh onion [14,15], but genic SSR markers have largely been ignored until recently; Tsukazaki et al., described transcriptome shotgun assembly (TSA)-derived markers in Japanese bunching onion [10].Genic SSRs have distinct advantages compared to genomic SSRs in terms of reducing the time and cost of developing SSR markers, and they are more useful as genetic markers because they are part of the transcriptome and represent the variation in the expressed portion of the genome [16].Recently, hundreds of genic SSR markers, even SNPs (single nucleotide polymorphisms), have been developed by analyzing large amounts of ESTs in the bulb onion (A.cepa L.) cDNA libraries [17][18][19].Therefore, developing genic SSR markers from transcriptome sequencing of the Welsh onion is a valuable area of research.
Due to the lack of whole genome information and the limited number of markers published for the Welsh onion, the priority of this study was to report on the development of genic SSR markers using a transcriptome approach, which provides new molecular markers for genetic diversity studies in the Welsh onion and related Allium species.In addition, these genic SSR markers will provide important resources for chromosome mapping and phylogenetic analysis of Allium plants, potentially contributing to target gene cloning, as well as the utilization of the MAS breeding.

Frequency and Distribution of Genic SSRs in the Welsh Onion Transcriptome
Since the RNA-Seq NGS technology allows for lower cost and higher throughput transcriptome sequencing than traditional Sanger sequencing, it has been widely used in many non-model organisms, including Tapiscia sinensis, Cucurbita moschata, and Larix kaempferi [20][21][22].
In this study, a total of 6766 SSR loci containing 1-5 bp repeat motifs were found in 6206 unigenes of the Welsh onion.Of these, 5497 (88.6%) contained a single SSR locus; 486 (7.8%) contained more than one SSR, and 275 (4.4%) possessed compound SSRs.Moreover, 6486 SSR loci (except for 280 compound SSRs) located in 5983 unigenes were counted and presented based on the transcriptome data (Table 1).The repeat motifs among the 6486 SSR loci included a total of 186 types, and the number of repeats varied from five to 24.The frequency and number of each SSR motif and the percentage of the primary repeat motifs were evaluated in Tables 1 and 2. The most abundant motif was mononucleotide (52.33%) followed by trinucleotide (31.96%), dinucleotide (14.57%), tetranucleotide (1.02%), and pentanucleotide (0.12%), and the most common class was n = 10 (32.25%), which was mostly composed of mononucleotide repeats.In addition, the most common class of dinucleotide was n = 6, and other types of repeat motif were n = 5.A/T was the most common motif, representing 98.29% of the mononucleotide repeat motifs and 51.43% of the total repeat motifs.Additionally, the dominant repeat motifs were AG/CT (34.29%),AT/AT (33.65%), and AC/GT (31.11%) among the dinucleotides, and AAG/CTT (26.44%) and AAAT/ATTT (33.33%) among the trinucleotides and tetranucleotides, respectively.Among the dinucleotide repeat motifs based on the 6468 SSR loci, the frequency of GA as a primary nucleotide (5.00%) was higher than GT (4.53%) in the Welsh onion transcriptome sequences (Table 3), and this result agreed with Tsukazaki et al., (GA 22.1%, GT 15.4%) [10].Similar results were also obtained for the bulb onion expressed sequence tags (ESTs) (GA 10.1%, GT 9.2%) [17], but GT repeats were much richer than GA repeats in the bulb onion genome [23], which was also the case in Welsh onion [9,15].According to the above results, the ratio of (GT)n:(GA)n may be different in the transcribed and non-transcribed regions of the genome.Additionally, the GA dinucleotide repeat motif can stand for multiple codons resting within the reading frame that can be translated into different amino acids, and GA may be present in the Ala and Leu codons, which have the highest number of proteins [24].
In addition, AT was the most abundant repeat motif among the dinucleotides and was more frequent than GC in the Welsh onion (Table 1).A similar result was reported in Tsukazaki et al., (AT 25.6%, GC 0.5%) [10] and, based on transcriptome sequencing, the same result that the AT repeat motif was more common than GC, has been obtained in many plant species, such as bulb onion, radish, mung bean, pear, Levant cotton, and so on [16,17,[25][26][27].Although a large number of the AT motif emerged from the transcriptome sequences, it has not typically been used to develop molecular markers because of its self-complementary nature, which leads to the formation of dimmers [28].

Development and Detection of Genic SSR Markers
Genic SSR markers are regarded as having great potential for genetic diversity analysis and chromosome mapping in crop species because of their specificity and high degree of conservation [29,30].Due to the steady decrease in cost and the increase in throughput data, NGS technology has become a powerful approach for the high-throughput discovery of genes and the generation of a large amount of sequence data for the identification of molecular markers [31,32].Therefore, the identification and development of genic SSR markers based on NGS is an efficient and cost-effective strategy.
In this study, a total of 2710 unigenes were used to design primers, and according to the standard, a total of 2512 unigenes (92.69%) were successfully designed, which obtained 2525 primers.A total of 311 (12.32%) primer pairs located in 307 unigenes were randomly selected from 2525 primers (Table S2) to perform the preliminary test, which consisted of 217 primers (69.77%) for the trinucleotide repeats, 50 (16.08%)for the dinucleotide repeats, and 38 (12.22%) and six (1.93%) for the tetranucleotide and pentanucleotide repeats, respectively.Furthermore, the remaining 2214 primers, located in 2205 unigenes, are presented in table S1 for reference.
In the preliminary test, a total of 165 (53.05%) primers generated stable and clear bands, out of which 91 (29.26%) polymorphic primer pairs were reliable (Table S3).Among these genic SSR markers, the majority were trinucleotide repeats (69.23%), followed by dinucleotide repeats (19.78%), and tetranucleotide repeats (10.99%).The result was consistent with the order of SSR motif frequency.
Compared with the bulb onion, the Welsh onion is suitable for genetic studies because it has a 28% smaller genome [33], but some of the genic SSR markers used in many genetic studies of the construction of a chromosome map and cultivar identification in Welsh onion [8,10,11] are derived from the bulb onion.The genic SSR marker in the Welsh onion has been far less developed than that in the bulb onion because less attention is currently being paid to the Welsh onion.Therefore, due to the limited number of SSR markers, we explored more SSR markers based on transcriptome sequencing to enrich the research into molecular marker-assisted selection and genetic diversity in the Welsh onion.

Validating the Effectiveness of Genic SSR Markers among 22 Allium Accessions
A sample of 38 primers (Table S4) was randomly selected from 91 polymorphic primer pairs to validate the effectiveness of genic SSR markers and to differentiate a set of 22 Allium accessions (Table 3).Of these 38 primers, six genic SSR markers could not be effectively amplified, and 10 genic SSR markers generated unified or poor polymorphic bands, which could have been due to the use of inappropriate materials or defective SSR markers.A total of 22 genic SSR markers exhibited a high level of polymorphism (Table 4), and 116 alleles were detected in total.The PCR amplification bands generated by the genic SSR markers MCL32, MCL40, and TO249 in 22 Allium accessions are displayed in Figure 1.The number of alleles (Na) discovered at these loci varied from three to nine with an average of 5.27 alleles per locus.The PIC value illustrated the high degree of polymorphism at each locus, which ranged from 0.44 to 0.85 with a mean of 0.63.According to the previously-described criterion, the three categories were defined as high (PIC > 0.5), moderate (0.25 < PIC < 0.5), and low (PIC < 0.25) [34,35], and a total of 16 (72.73%)primers displayed a high level of PIC.In contrast to other plants, the Na and PIC values for the Welsh onion were lower than those reported in pear buds [36], 9.42 vs. 0.75, respectively.However, the values were higher than those for sweet potato (2.94 and 0.35, respectively) [25], cowpea (3.06 and 0.53, respectively) [37], and mung bean (3 and 0.34, respectively) [16].
In recent years, transcriptome sequencing has proven to be one of the most efficient ways of developing genic SSR markers in many plant species, and many SSR markers based on transcriptome sequencing have been widely utilized in genetic diversity analysis, chromosome mapping and gene-based association studies [19,25,41,42].Based on the results of our experiments, 22 genic SSR markers divided the 22 Allium accessions into three groups and seven subgroups using UPGMA cluster analysis.Although the resulting dendrogram could not sufficiently explain the genetic relationships of the accessions, it demonstrated the effectiveness of the 22 genic SSR markers.A possible reason for the inaccurate dendrogram was that the Welsh onion possesses various local cultivars and a complex genetic background and, furthermore, useful molecular markers from the Welsh onion transcriptome were lacking.Therefore, developing more and better molecular markers from the Welsh onion transcriptome is very important and valuable for the further analysis of Allium phylogenetic relationships.

Plant Materials and DNA Extraction
The 22 Allium accessions used to test and verify the polymorphic primer in this study (Table 3) were local cultivars or inbred lines.Genomic DNA was extracted from multiple fresh young leaves of each line using the modified hexadecyl trimethyl ammonium bromide (CTAB) method, and the quality and quantity of the DNA were checked in 1% agarose gels and with a Thermo Scientific NanoDrop™ 2000C Spectrophotometer (Thermo Scientific NanoDrop, South Logan, Utah, USA), respectively.

Identification of SSR Loci from the Welsh Onion Transcriptome and Primer Design
Based on the 6206 unigenes obtained by the RNA-Seq approach in a previous study (RNA-Seq data have been uploaded to NCBI under accession number SSR1609126 and SSR1609976) [43], 5983 unigenes were searched for 6486 genic simple sequence repeat (SSR) loci, excluding 280 compound SSRs due to statistical complications, in this study using the MicroSAtellite identification tool (MISA) [44].The SSR loci containing repeat units of 1-5 nucleotides were identified, and the minimum SSR length criteria were identified as ten iterations for mononucleotide repeats, six iterations for dinucleotide repeats, and five iterations for the other repeat units.
Primer3 software, version 4.0, was used to design the polymerase chain reaction (PCR) primer in the flanking regions of these SSR loci [45], whose lengths were not less than 50 bp.Excluding mononucleotide and unsatisfactory compound SSRs, a total of 2710 unigenes were used to design primers, and the criteria for screening primer pairs were set to include an optimum primer length of 20 bp, PCR product sizes ranging from 100 to 500 bp, a GC percentage between 40% and 60%, and an annealing temperature ranging from 50 °C to 62 °C.Moreover, the secondary structure, such as hairpin, dimmer, false priming, and cross-dimmer, cannot emerge from forward and reverse primers.

Table 1 .
Frequencies of the different SSR repeat motif types observed in the Welsh onion transcriptome.

Table 2 .
Number and frequencies of the main repeat motif types.

Table 3 .
List of 22 Allium accessions used in the polymorphism analysis.

Table 4 .
Characteristics of the 22 polymorphic SSR markers validated in 22 Allium accessions.
a A: major allele frequency; b Ng: number of genotypes; c Na: number of alleles; d PIC: polymorphism information content.