Development and Characterization of Novel Genic-SSR Markers in Apple-Juniper Rust Pathogen Gymnosporangium yamadae (Pucciniales: Pucciniaceae) Using Next-Generation Sequencing

The Apple-Juniper rust, Gymnosporangium yamadae, is an economically important pathogen of apples and junipers in Asia. The absence of markers has hampered the study of the genetic diversity of this widespread pathogen. In our study, we developed twenty-two novel microsatellite markers for G. yamadae from randomly sequenced regions of the transcriptome, using next-generation sequencing methods. These polymorphic markers were also tested on 96 G. yamadae individuals from two geographical populations. The allele numbers ranged from 2 to 9 with an average value of 6 per locus. The polymorphism information content (PIC) values ranged from 0.099 to 0.782 with an average value of 0.48. Furthermore, the observed (HO) and expected (HE) heterozygosity ranged from 0.000 to 0.683 and 0.04 to 0.820, respectively. These novel developed microsatellites provide abundant molecular markers for investigating the genetic structure and genetic diversity of G. yamadae, which will help us to better understand disease epidemics and the origin and migration routes of the Apple-Juniper rust pathogen. Further studies will also be completed to dissect how human activities influence the formation of current population structures. Furthermore, these SSR (simple sequence repeat) markers can also be used as tools to identify virulence by mapping the whole genomes of different virulent populations. These markers will, thus, assist the development of effective risk-assessment models and management systems for the Apple-Juniper rust pathogen.


Introduction
The domesticated apple (Malus × domestica Borkh., family Rosaceae, tribe Pyreae) is the main fruit crop of temperate regions of the world [1]. The production of apples in most areas of northern China accounts for more than half of the world's output and, thus, plays an essential strategic role in export, agricultural structure adjustment, and farmer incomes [2,3]. However, hectares of apple production can be limited by a variety of diseases caused by fungi [4,5]. Gymnosporangium yamadae Miyabe, belonging to the Pucciniaceae family, is a serious rust pathogen of domestic apples as well as cultivated junipers, which causes much concern in Asia [6,7]. As one of the non-European Gymnosporangium species, it has been listed as an A1 quarantine organism by European and Mediterranean Plant Protection Organization (EPPO) [8]. In China, this species has a wide range of distribution throughout the country and reduces fruit yield by inhibiting photosynthesis and increasing respiration due to infection, which threatens the economic development of local orchards [9][10][11][12]. Under natural conditions, this pathogen is believed to be heteroecious (two taxonomically unrelated hosts are needed to complete its life cycle) and macrocyclic (0-spermatia, I-aeciospores, III-teliospores and IV-basidiospores) [6,7]. The telial host of G. yamadae, Juniperus chinensis L., exists commonly in parks, roadsides and graveyards, and the alternate aecial hosts, Malus baccata, M. domestica, M. prunifolia and M. toringo and other several species of Malus, are often planted in the same sites as J. chinensis. With the increased movement of junipers as nursery stock and cultivated apple as favored fruits for consumption, the risk of disease occurrence is rather high [13]. Although the primary method of rust control is the application of protective or systemic fungicides and blocking of the infection cycle, the most popular method for disease control is the use of resistant varieties, due to its efficiency, low cost and reduced impact on the environment [14].
Rust fungi are obligate parasites that only infect specific plant genera or species [7], it shows genetic diversity [15] and there are many studies focused on the ability of rust fungi to adapt. There has been a long history of breeders fighting against the appearance of new rust races that have overcome resistance genes in crops and trees [16][17][18][19]. Twenty Malus species have been tested by inoculation experiments in 1984 and the results indicated that only Malus halliana Koehne showed high resistance to infection by basidiospores of G. yamadae [20]. However, the genetic diversity, population structure and gene flow of the G. yamadae population remained unclear due to the lack of research on molecular markers, which hindered the effective prevention and control of this species based on genomic information.
For the wheat stem rust pathogen, Puccinia graminis f. sp. tritici, and wheat stripe rust pathogen, P. striiformis f. sp. tritici, the population structure has been studied using molecular markers since the 1980s [21][22][23]. These markers proved useful in studying genetic diversity; however, each has limitations in application, such as the issue of replicability, ambiguous fragment size, and homoplasy. Especially, in the case of certain dikaryotic organisms, codominant markers such as simple sequence repeats (SSRs-microsatellite) have been shown to be more informative in revealing genetic variation compared to dominant markers [24,25]. Microsatellites are regions of DNA consisting of short, tandemly repeated motifs (1-6 bp in length), which have a high level of polymorphism and are widely dispersed in the genomes of all prokaryotic and eukaryotic organisms [24,26]. Microsatellite analyses are relatively simple and easier to score than other genetic markers (e.g., restriction fragment length polymorphism (RFLP), random amplified polymorphic DNA (RAPD) and amplified fragment length polymorphism (AFLP) because their gel band patterns are easy to detect, producing unambiguous results.
Thus, microsatellites have recently been developed and validated in many rust fungi, such as P. striiformis f. sp. tritici, P. graminis f. sp. tritici, P. coronata, P. novopanici, P. triticina and coffee rust Hemileia vastatrix [22,[27][28][29][30][31][32][33][34][35]. However, no microsatellite primers specific to G. yamadae have been developed to date, so microsatellites for the studies of G. yamadae are urgently required. The traditional approaches to microsatellite development, such as massive-scale cloning and sequencing of DNA or EST libraries, are labor intensive and time-consuming. With the rapid development of next-generation sequencing technology, an alternative approach to identify a great number of novel microsatellites based on large-scale RNA-seq became fast, cost-effective and reliable, especially for non-model species [36].
In our study, we used NGS (next generation sequencing) to sequence the G. yamadae transcriptome and screened transcript sequences obtained by de novo assembly to detect microsatellites. We then validated the SSR primer sets using G. yamadae samples from two geographical areas in China. This is the first time that novel microsatellites for G. yamadae have been identified. The developed markers will be important resources to further determine the genetic diversity and genetic structure of this severe pathogen.

Pair-End Sequences Obtained with the Illumina de Novo Sequencing
In our study, we built the transcriptome of G. yamadae during the telial stage of its lifecycle. Three biological replicates were collected, and cDNA libraries were independently constructed and subsequently sequenced using an Illumina Hi-Seq 2000 platform. Approximately 150 million 100-bp paired end (PE) raw reads were generated from each G. yamadae cDNA library (Table 1). These data have been deposited in the NCBI Sequence Read Archive under accession numbers SRR5167035-SRR5167040. After removing the adapters and low Phred quality sequences, approximately 150 million high-quality sequence reads were obtained from the three G. yamadae libraries ( Table 1). The trimmed reads were then assembled into transcriptomes using the Trinity package [37]; 49,318 transcripts, with a mean length of 1006 bp and an N50 value of 1957, were obtained for G. yamadae. Using overlapping information from high-quality reads, we identified 35,102 unigenes for G. yamadae, with average lengths of 756 (Table 2).

Mining of Genic SSRs
A total of 35,102 unigenes covering 26,521,164 bp of the G. yamadae genome were identified in our data. Of these, 3942 sequences with SSR were detected, and 1136 sequences contained more than one SSR.

Distribution of SSR Motif
The proportions of six different SSR unit sizes were not evenly distributed among all SSRs. Different repeat units occurred at the following frequencies: 37.27% for mono-nucleotides, 34.88% for di-nucleotides, and 23.90% for tri-nucleotides; these accounted for the largest proportions. An additional 2.55%, 0.55%, and 0.87% of the repeats were tetra-nucleotides, penta-nucleotides and hexa-nucleotides, respectively (Table 3). We divided the motif repeat types into seven groups. A repeat number of 5-8 was abundant in the di-and tri-nucleotide motif types, and a repeat number of 9-12 was abundant in the mono-nucleotide motif type. Of the 1884 mono-nucleotide motifs, the (A/T) n mono-nucleotide repeat motif was the most abundant in the dataset. The three other main unit types were (AT/AT) n di-nucleotides, (ACC/GGT) n and (ATC/ATG) n in tri-nucleotides and (AAAT/ATTT) n in tetra-nucleotides, which occurred at frequencies of 15.43%, 4.23%, and 0.26%, respectively (Table 3).  Further analysis indicated that the copy numbers of different repeat motifs were distributed unevenly ( Table 4). The copy numbers of different motif types ranged from 5 to 24, and the four most abundant copy numbers for SSRs were 10 (24.54%), 6 (18.31%), 5 (14.48%), and 7 (10.62%). According to the distribution of microsatellites ( Figure 1, Table 4), it seems that the number of loci decrease with the increase in corresponding motif repeats. Table 4. Frequency of different repeat motif in SSRs.

PCR Amplification and Polymorphism of Genic-SSRs
Using the software Primer 3 [38], 2832 SSR primers were designed according to the flanking sequences of the 3942 SSR-containing sequences. Of these, 88 primers were randomly selected and synthesized for initial validation on eight individuals. The information about the selected primers is given in Table S1, and the information about the specimens used for amplification is given in Table S2. The PCR amplification results were evaluated by gel electrophoresis. Of the 88 primers, PCR amplicons were not obtained for 40 markers (45%) (Table S3), and the successfully amplified markers used are described in Supplementary Materials. The remaining primers (from previous steps) were tested on twenty individuals (Table S4) of G. yamadae using a capillary sequencer, whose forward primers were modified by the PC tail (Primer tail C) (5 CAGGACCAGGCTACCGTG 3 ) to test their genotyping efficiency and marker polymorphism in G. yamadae. Among the 48 primers that produced amplicons, 26 (54%) were not polymorphic (Table S5), including eighteen di-nucleotide, four tri-nucleotide, one penta-nucleotide and one hexa-nucleotide repeat markers. The remaining 22 loci have polymorphic amplification. These markers can serve as candidate markers for future research, such as genetic diversity and relatedness analysis of different populations (Table S6).

Characteristics of Validated Microsatellite Loci
The polymorphic markers obtained were further assessed using two natural G. yamadae populations, including 50 individuals from Shaanxi province and 46 individuals from Beijing, China (Table S7). The 22 microsatellite markers had allele numbers ranging from 2 to 9 with an average value of 6 per locus. The polymorphism information content (PIC) values ranged from 0.099 to 0.782 with an average value of 0.48. The observed (H O ) and expected (H E ) heterozygosity ranged from 0.000 to 0.683 and 0.04 to 0.820, respectively. The inbreeding coefficient (F IS ) ranged from −0.118 to 1.000. Hardy-Weinberg equilibrium (HWE) was tested in two geographical populations; 17 loci in the Shaanxi population and 18 loci in Beijing province departed from HWE (p < 0.05) ( Table 5). Among these loci, GY14, GY18, GY19, GY27, GY42, GY66, GY67, GY68, GY70, GY72 and GY79 showed significant departures from HWE in both populations (p < 0.001).  The Bayesian algorithm implemented in the program STRUCTURE indicated that the model K = 3 explained the data in a satisfactory manner, because this model resulted in the highest ∆K value (Figure 2). The population structure of G. yamadae was inferred with the dataset of 22 microsatellite markers, and the 96 individuals from two geographical populations were grouped into three clusters, as shown in Figure 3.  The topology of the unweighted pair-group method analysis (UPGMA) tree based on genetic distance shows the relationship between the two populations ( Figure 4). Eight individuals from Shaanxi province and eight individuals from Beijing were grouped into independent clusters, which are highlighted in yellow and pink, respectively. These two clusters contributed 1/6 of the total tested individuals. The main cluster, marked in purple, is a mixed group that contains individuals from both geographical areas; however, sub-branches in this cluster were independent from each other and related to geographical distribution, with some exceptions, e.g., individuals 69, 70 and 75 from Beijing were clustered with the Shaanxi individuals ( Figure 4).

Discussion
As one of the most ubiquitous pathogens of apples, G. yamadae has negative impacts on apple production as well as on cultivated junipers in China. However, the origin, introduction pathways and current population structure of G. yamadae remain unknown. The lack of a significant set of polymorphic microsatellite markers for G. yamadae and other Gymnosporangium species was one of the main justifications of the present study.
In this study, 22 polymorphic microsatellite markers were developed for G. yamadae based on transcriptome data sequenced by NGS. Microsatellites proved to be informative markers that can reveal heterozygotic or heterokaryotic genotypes in diploid or dikaryotic organisms [21,25,35]. Moreover, microsatellites are easy-to-use tools for population genetic and ecological studies as well as epidemiological and forensic research [27,29]. The characterization of SSRs requires the identification of genomic fragments containing repetitive sequence motifs [26]. Compared with the library-enriched method, recent advances in large-scale RNA-seq provide a fast, cost-effective and reliable approach for the generation of large expression datasets in non-model species [39][40][41]. We have described the efficient use of NGS to obtain a large amount of sequence data and applied bioinformatics tools to develop a novel sample of 3942 sequence-containing microsatellites for this species. The di-nucleotide types have generally been observed to have the highest frequency in many other rusts [30][31][32][33][34][35]. Compared with amplification in other fungi [22,[27][28][29][30][31][32][33][34][35], the PCR amplification efficiency of the primers used here seems relatively low. Gymnosporangium species are demicyclic and we could not collect massive spores by inoculation experiments. Therefore, we extracted DNA directly from a sorus on infected host plants. Some specimens with few spores yielded relatively low-quality DNA and a low percentage of amplicons (48 out of 88).
The stabilization of genetic diversity is very important for the long-term interest of any species [42]. High level genetic diversity is related with a certain population size, for the loss of heterozygosity could have a negative effect on population fitness. However, the polymorphism information content (PIC) and alleles per locus were relatively low compared to those of other rusts [30][31][32][33][34][35], with average values of 0.48 and 6, respectively, which suggested that low genetic diversity was detected based on microsatellite markers. Furthermore, eleven microsatellite sites in both tested populations deviated significantly from Hardy-Weinberg equilibrium (p < 0.001), which may have resulted from heterozygote deficiency, because H O is much lower than H E in these loci (Table 5). Heterozygote deficiency can be caused by the presence of null alleles, small sample size, or the effects of population subdivision (Wahlund's effect) [43].
The individuals from Beijing and Shaanxi were grouped into three main clusters. The largest cluster was a mixed group consisting of two geographical populations. One of the possible reasons for this observation is the frequent trans-provincial transportation of infected Juniperus seedlings all over the country. J. chinensis L. is one of the most popular landscaping plants in China and can be found in many areas, such as gardens, parks, roadsides, cemeteries and so on. However, during our collection in the field, we found that many seedlings with overwintering fungus galls had been recently planted locally, meaning that it is very likely that these seedlings were transported from another city or province. Usually, after a heavy spring rain, the telial horns extrude from the branch galls on the host juniper trees and become gelatinous; these structures can then produce basidiospores that are wind-blown to local Malus hosts. If the infection in Malus plants is successful, spermogonia develop in orange lesions on the upper surfaces of leaves. After a period of time, aecia also develop in the same orange lesions, but on the lower surfaces of the leaves. Mature aeciospores (spores produced in the aecia) are wind-blown to the juniper trees in the same year from early summer to fall. If infection is successful in the juniper host, a gall forms, and the telial horns grow from these symptomatic tissues the following spring or in subsequent years, concluding a complete life-cycle of G. yamadae. All this leads to a frequent regional gene exchange; thus, we could infer that some specimens collected in Beijing probably were from Shaanxi province or have an intimate relationship with those specimens.
Another likely reason for the observed clustering may be the limited number of specimens. The sample size should be large enough to make sure the results are an authentic reflection of the true ecological pattern rather than being an occasional phenomenon [44]. However, the sample size obtained in reality may be limited by many factors, such as budget, time and available samples. In a future study, we plan to collect many samples in each geographical area, and these samples would better cover enough regions where G. yamadae is potentially distributed to provide a more accurate conclusion about the genetic diversity of G. yamadae in China. Anyhow, microsatellites developed in this study are still effective molecular markers for detecting the polymorphism in G. yamadae populations.
Regardless of the above limitations, the microsatellites developed in this study will help researchers to better identify the main genetic groups in G. yamadae populations as well as the geographical relationships among geographically spaced populations. As a result, we will have novel insights into the importance of human activity in shaping the current structure of the G. yamadae population in China. Moreover, with the increasing genomic resources of G. yamadae, these informative SSR markers will help to identify virulence mechanisms associated with rust biology by mapping them onto the whole-genome sequences of populations segregated for virulence.
In summary, these highly informative SSR markers prove to be valuable molecular tools to study the aetiology of apple-juniper rust, which will help with the development of effective strategies for its management.

Rust Sample Collection
Samples of G. yamadae analysed in RNA-seq experiments were collected from an infected cypress tree (Juniperus chinensis.) prior to telial horn gelatinization at Northwest A & F University, Yangling, Shaanxi, China, in March 2016. Each sample was harvested in three biological replicates using liquid nitrogen and stored at −80 • C before RNA extraction. Each biological replicate was used for RNA extraction and RNA-Seq library construction.
The 96 samples prepared to be used to validate the polymorphism of G. yamadae microsatellites were collected in two geographic regions in China, of which 50 samples were from Shaanxi, and 46 samples came from Beijing. Likewise, all these samples were stored in absolute ethanol and frozen at −80 • C prior to use.

RNA Isolation and Library Preparation
An approximately 50 mg sample of teliospores from each replicate was used for total RNA extraction following a previously described protocol [45], and total mRNA was then purified using oligo (dT) magnetic beads. cDNA libraries were prepared according to Illumina sequencing sample preparation protocols. In total, 3 paired-end cDNA libraries were constructed, and sequenced with an Illumina HiSeq 2000 platform by Novogene Bioinformatics Technology Co., Ltd. (Beijing, China), and 100-bp paired-end reads were generated.

Reprocessing of Illumina Raw Data and de Novo Transcriptome Assembly
Using Trimmomatic software [46], clean data were obtained by removing reads containing adapter, reads containing ploy-A sequences (≥10%) and low quality reads (sQ ≤ 5) from raw data. Simultaneously, the Q20, Q30, GC content and sequence duplication level of the clean data were calculated. Transcriptome assembly was accomplished using Trinity software [37] with default parameters with min_kmer_cov set to 2 by default and all other parameters were set to the default values.

SSR Detection and Development of Primers
The microsatellite isolation from the genomic sequences was conducted by MISA software [47], and the primers for selected loci were designed with Primer 3 [38], based on the following core criteria: G/C content between 40% and 70%, annealing temperature between 57 • C and 61 • C, a product size ranging from 100 bp to 280 bp. The remaining parameters were at default settings. All candidate SSR primer pairs were synthesized by Ruibiotech Ltd. (Beijing, China).

DNA Extraction, Primer Testing and Polymorphism Detection
First, to validate the SSR markers, the genomic DNA of the 96 samples were extracted from single spores by proteinase K digestion followed by a standard phenol-chloroform method.
Second, we randomly selected 64 primers with the basic principle of having no mono-nucleotide motif repeats and a large repeat number and added a PC tail (Primer tail C) (5 CAGGACCAGGCTACCGTG 3 ) to the 5 end of all candidate forward primers. Eight G. yamadae individuals were selected for the initial test. Amplification was performed in a final 10 µL reaction mixture containing 0. Third, the polymorphism of the primers validated in the previous step was further tested using a capillary sequencer. The PCR programme for amplification was the same as above. Amplification was performed in a final volume of 10 µL, containing 0.5 µL of template DNA, 5 µL of Master Mix (Promega, Madison, WT, USA), 0.08 µL of forward primer (modified by the PC tail) at a final concentration 0.08 µM, 0.16 µL(10 µM) of reverse primer at a final concentration 0.16 µM, 0.32 µL of fluorescently modified PC tail (FAM(blue), Hex(green), Rox(red)), including different colours at a final concentration of 0.32 µM, and 3.94 µL of ddH2O. The amplicons were visualized using a Gel Doc EZ system (Bio-rad, Berkeley, California, USA) and precise allele band sizing was conducted on the ABI 3830xl DNA analyser (Applied Biosystems, Froster, CA, USA) with the GeneScan 500LIZ size standard (Applied Biosystems, Froster, CA, USA).
Finally, 96 samples from the two regions were used to validate the effectiveness of the primers screened by the above steps. The amplification mixture, programme, and analysis of the PCR fragments were the same as in the second step.

Statistical Analysis
We used Popgene V1.32 [48] to calculate the number of alleles, observed heterozygosity (H O ), expected heterozygosity (H E ) and Shannon information index (I) and Powermarker V3.25 [49] to determine the inbreeding coefficient (Fis), polymorphism information content (PIC), deviations from the Hardy-Weinberg equilibrium (HWE), genetic distance and UPGMA tree based on the genetic distance. The possibility of the cryptic population structure of two geographical individuals were detected by STRUCTURE 2.2 [50]. Markov chain Monte Carlo (MCMC) consisted of a 100,000-iteration burn-in followed by 1,000,000 iterations. The simulated K values ranged from 2 to 7. Ten independent runs were implemented for each specific K value to verify the consistency of the results. The ad hoc estimated likelihood of K (∆K) was used to determine the most likely number of populations (K) based on the rate of change in the log probability of the data (LnP(D)) [51].

Conclusions
In this study, we first characterized and developed several polymorphic microsatellite markers for G. yamadae from random regions of the transcriptome identified using next-generation sequencing. The new markers detected high levels of DNA polymorphism in G. yamadae and could be used to assess genetic diversity in apple-juniper rust. The loci assessed in our study revealed the genetic structure of two geographical populations. This approach provides a rapid method for high-throughput development of microsatellite markers from non-model species without reference genomes. These highly informative SSR markers will be potentially powerful tools to study genetic diversity and population structures of G. yamadae in China. In addition, they also facilitate a better understanding of the geographical origin and migration routes of G. yamadae, which will determine the influence human activity on the shaping of the population structures. Furthermore, comparative studies of markers mapped in populations of differentiating virulence can be used to identify virulence factors related to infection as well as develop apple resistance genotypes. Above all, these SSR markers will provide us with a relatively accurate understanding of the relationship within geographical populations and recent migration routes of G. yamadae in an upcoming study, which is crucial to understanding disease epidemics, developing risk-assessment models and slowing down the process of disease spread.