SLAF-seq Uncovers the Genetic Diversity and Adaptation of Chinese Elm ( Ulmus parvifolia ) in Eastern China

: The Chinese elm is an important tree ecologically; however, little is known about its genetic diversity and adaptation mechanisms. In this study, a total of 107 individuals collected from seven natural populations in eastern China were investigated by speciﬁc locus ampliﬁed fragment sequencing (SLAF-seq). Based on the single nucleotide polymorphisms (SNPs) detected by SLAF-seq, genetic diversity and markers associated with climate variables were identiﬁed. All seven populations showed medium genetic diversity, with PIC values ranging from 0.2632 to 0.2761. AMOVA and Fst indicated that a low genetic di ﬀ erentiation existed among populations. Environmental association analyses with three climate variables (annual rainfall, annual average temperature, and altitude) resulted in, altogether, 43 and 30 putative adaptive loci by Bayenv2 and LFMM, respectively. Five adaptive genes were annotated, which were related to the functions of glycosylation, peroxisome synthesis, nucleic acid metabolism, energy metabolism, and signaling. This study was the ﬁrst on the genetic diversity and local adaptation in Chinese elms, and the results will be helpful in future work on molecular breeding.


Introduction
Chinese elm (Ulmus parvifolia), which is native to China, Japan and Korea, has become a widely distributed ornamental tree that is frequently planted on lawns, along streets and in parks [1]. In China, the wild resources of U. parvifolia are mainly located in the northern and eastern areas, exhibiting a wide range of adaptation. Within this area, Chinese elm is recognized as a drought, heat, and cold tolerant tree [2][3][4]. Nevertheless, as global climate alteration will happen in the near future, it remains questionable to what degree the speed of future adaptation can keep up with the pace of climate change [5]. Therefore, an in-depth understanding of the genetic diversity and the genetic regulation of adaptation in Chinese elms is essential. Revealing polymorphisms and genes that determine adaptation would provide the basis for breeding genetically improved germplasms that could be used in changing environments.
Genetic diversity is the maximum of genetic variation presented in the genetic makeup of a specific species [6]. It is an important component of species biodiversity. Monitoring the genetic diversity of natural populations is of paramount importance, since it could shed light on the population structure, history, ecology, and adaptation of the species [7]. Local adaptation occurs gradually over time, with relatively long generation times. During the adaptation process, alleles that are best fitted to the specific climate gradually prevail through positive selection [8]. Those alleles, once identified, can give new insights into plant adaptive evolution, as well as be utilized for future molecular breeding.
Previous research on genetic diversity and local adaptation of plants has been conducted at the DNA-based molecular level, such as simple sequence repeat (SSR), inter-simple sequence repeat (ISSR), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and single nucleotide polymorphism (SNP) [7,9,10]. SNPs are genome sequence variations that occur when there is a single nucleotide change in the DNA sequence [11]. SNPs are the most abundant and stable type of DNA variation in a genome, therefore, the density of SNP markers is much higher than any other molecular markers [3]. Nowadays, reduced representation sequencing, such as genotyping-by-sequencing (GBS) and specific locus amplified fragment sequencing (SLAF-seq), has been used to quickly and efficiently identify numerous SNPs in plants [12,13]. As reduced representation sequencing can be performed without a reference genome, it has been tested on many kinds of trees, such as pecans [14], Japanese conifers [10], and masson pine [15].
To date, reports regarding the genetic diversity and adaptive mechanisms in Chinese elms remain remarkably scant. In this study, we attempt to explore the genetic diversity of seven natural populations of Chinese elms in eastern China, and then identify the potential local adaptation genes based on SLAF-seq identified SNPs. Our results might help in the marker-assisted breeding of Chinese elms in the future.

Plant Materials
Natural populations of Ulmus parvifolia were investigated in the present study. A total of seven populations with 107 individuals were collected from Jiangsu Province (XZ, JN, CS), Anhui Province (HUOS, HS), and Zhejiang Province (FY, LH). For each population, 13~17 individuals were sampled, with individuals at least 300 m apart. Collection details and climate information for the seven populations are summarized in Table 1 and Figure 1 Young healthy leaves were sampled and stored at −80 • C until further use.
Forests 2020, 11, x FOR PEER REVIEW 2 of 14 once identified, can give new insights into plant adaptive evolution, as well as be utilized for future molecular breeding. Previous research on genetic diversity and local adaptation of plants has been conducted at the DNA-based molecular level, such as simple sequence repeat (SSR), inter-simple sequence repeat (ISSR), random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and single nucleotide polymorphism (SNP) [7,9,10]. SNPs are genome sequence variations that occur when there is a single nucleotide change in the DNA sequence [11]. SNPs are the most abundant and stable type of DNA variation in a genome, therefore, the density of SNP markers is much higher than any other molecular markers [3]. Nowadays, reduced representation sequencing, such as genotyping-by-sequencing (GBS) and specific locus amplified fragment sequencing (SLAFseq), has been used to quickly and efficiently identify numerous SNPs in plants [12,13]. As reduced representation sequencing can be performed without a reference genome, it has been tested on many kinds of trees, such as pecans [14], Japanese conifers [10], and masson pine [15].
To date, reports regarding the genetic diversity and adaptive mechanisms in Chinese elms remain remarkably scant. In this study, we attempt to explore the genetic diversity of seven natural populations of Chinese elms in eastern China, and then identify the potential local adaptation genes based on SLAF-seq identified SNPs. Our results might help in the marker-assisted breeding of Chinese elms in the future.

Plant Materials
Natural populations of Ulmus parvifolia were investigated in the present study. A total of seven populations with 107 individuals were collected from Jiangsu Province (XZ, JN, CS), Anhui Province (HUOS, HS), and Zhejiang Province (FY, LH). For each population, 13~17 individuals were sampled, with individuals at least 300 m apart. Collection details and climate information for the seven populations are summarized in Table 1 and Figure 1 Young healthy leaves were sampled and stored at −80 °C until further use.

High-Throughput Sequencing
About 20 mg of leaves were used for genomic DNA extraction via the DNeasy Plant Pro Kit (Qiagen, Hilden, Germany). DNA concentration and quality were assessed with a Nanodrop 1000 Spectrophotometer (Thermo Fisher Scientific, Wilmington, MA, USA) and 2% agarose gel electrophoresis. Quantified DNA samples were diluted to 100 ng/µL for the subsequent SLAF-seq analysis. SLAF-seq was performed according to a previous report [12], with some modifications. Since the genome of Chinese elm has not been published, we used the Trema orientale (the same species in Ulmaceae) for the prediction of enzyme digestion. Briefly, the reference genome of Trema orientale was used to perform marker discovery surveys through simulating in silico the number of markers obtained by various restriction enzymes. To get >100,000 SLAF tags that were evenly distributed in the genome, two restriction enzymes, HinCII and HaeIII, were finally selected. The efficiency of enzyme digestion was of importance for the reduced-representation sequencing. For the present study, Oryza sativum ssp. japonica DNA with a high-quality genomic information was used as a control to evaluate the quality of enzyme digestion. Following digestion, a single nucleotide (A) was added to the 3 end using dATP at 37 • C, and then Dual-index adapters were ligated to the A-tailed DNA fragments. PCR amplification was subsequently performed using diluted restriction-ligation DNA as the template. The products of PCR were purified and pooled together. DNA fragments that were 414-464 in length were collected from agarose gel, and were chosen as SLAF tags. High-throughput sequencing was performed using an Illumina-HiSeq TM 2500 sequencing platform (Illumina, Inc.; San Diago, CA, USA) at Beijing Biomarker Technologies Corporation (Beijing, China).

SNP Calling
Raw reads generated from the sequencing platform were first qualified through removing the adapter sequence included in the raw reads, low-quality reads (quality scores < 20), and empty reads (reads just contained adapter sequence). High quality paired-end reads were clustered using the BLAT software based on sequence similarity [16]. Sequences with over 90% similarity among different individuals were identified as one SLAF locus [12]. Samtools [17] and the Genome Analysis Toolkit (GATK) [18] were used for SNP calling, and their intersection was considered to indicate reliable SNPs. For the phylogenetic analysis, SNPs with a minor allele frequency (MAF) < 5% and missing rate > 0.2 were filtered.

Diversity Analysis
A total of 457,888 SNPs from 107 individuals were developed to calculate the genetic diversity and population structure. The commonly used indexes of genetic diversity, including the observed allele number (Na), expected allele number (Ne), observed heterozygous number (Ho), expected heterozygous number (He), Nei's diversity index (H), Shannon's wiener index (I), and polymorphism information content (PIC), were calculated by POPGENE [19]. These indexes were calculated to estimate the degree of allele distribution (Na and Ne), genomic heterozygosity (Ho and He), gene diversity (H and I) and DNA polymorphism (PIC). In order to assess the population differentiation, Analysis of molecular variance (AMOVA) was calculated to estimate the partitioning of genetic variance among populations. Meanwhile, pairwise fixation index (Fst) among populations was also computed to detect how gene diversity was partitioned at each level. Inter-individual fixation index (FIS) was analyzed to determine the deviation of genotype frequencies from Hardy-Weinberg proportions within each population. AMOVA, Fst, and FIS were estimated by Arlequin [20].
The phylogenetic tree was constructed by MEGA X software with the following parameters: neighbor-joining method, Kimura 2-parameter model, and 1000 bootstrap replicates. The population structure was analyzed by Admixture [21], on the basis of the maximum-likelihood method. The number of populations (K), ranging from 1 to 10, was tested, and each individual was assigned to its respective populations according to the maximum membership probability.

Climatic Association Analysis
Two programs, Bayenv2 [22,23] and LFMM [24], were used to detect outlier loci that were possibly associated with climatic variables. First, we used Bayenv2 to detect correlations between SNP allele frequencies and environmental variables. A covariance matrix of allele frequencies was estimated across populations using the full set of SNPs to avoid population-specific effects. For each tested SNP, this program generated a Bayes factor (BF) and nonparametric Spearman's rank correlation coefficient (ρ) based on the Markov chain Monte Carlo (MCMC). In this study, the significance threshold for the putative adaptive makers were those ranked among the top 1% of BF values (log 10 BF > 2.75) and top 5% of ρ values. The other software, LFMM, was also used for gene-climate association analysis. As it estimates the hidden impact of population structure, LFMM permits the presence of background levels of population structure (latent factors). The detected SNPs that exhibit an association with the environment were determined according to the z-score. Bonferroni adjustment was used on the z-score values for multiple tests. Markers with z-scores > 2.8 and a p-value < 0.01 were considered to be significant. Putative functions for the identified outlier loci were annotated using the NCBI and UniProt databases.

SNP Detection
High-throughput sequencing based on SLAF generated a total of 439.74 M pair-end reads, with a mean GC content of 42.90%, and an average Q30 of 96.70%. We obtained a total of 2,059,418 high-quality SLAF tags for the 107 samples, with an average depth of 18.88x for each SLAF (Table 2). For the SLAF tags, 529,271 were polymorphic. These polymorphic SLAFs contained 4,138,972 SNPs in total, and 457,888 of them were utilized in further analysis after applying the filtering criteria.

Genetic Diversity and Genetic Differentiation
The value of the observed allele number (Na) was 2 across populations, and the values of the expected allele number (Ne) ranged from 1. The maximum value of PIC was presented in the FY population, while the minimum value was found in the XZ population. As a measure of intragametophytic selfing, F IS were low in our study, varying from −0.03849 (FY population) to 0.06769 (HS population) ( Table 3). All the F IS were on Hardy-Weinberg equilibrium (p > 0.05). The pairwise fixation index (Fst) is a measure of genetic differentiation among populations. In our study, the lowest genetic differentiation existed between the HS and HUOS populations, with an Fst value of 0.00712. The LH and XZ populations presented the largest genetic differentiation, with an Fst value of 0.09106 (Table 4). AMOVA indicated that the maximum diversity occurred within individuals (92.22%), while the minimum diversity presented among individuals within populations (3.54%). A total of 4.24% of the genetic variation occurred among populations (Table 5).

Phylogenetic Relationship and Population Structure
The genetic relationships of the 107 individuals were exemplified by a phylogenetic tree. Interestingly, we found that the individuals could not be divided into distinct clades, which indicates a weak population structure of the individuals (Figure 2). Generally, individuals in the same subclade were from the same population ( Figure 2).
The genetic structure of the U. parvifolia populations was assessed with the Admixture software. As shown in Figure 3, the lowest K-values were detected when K = 1, indicating that a weak population structure existed in the individuals. A relatively low K-values was seen when K = 2, and correspondingly, the 107 individuals could be categorized into two groups. Group I contained 91 individuals, which were mainly from the FY, MH, HS, XZ, JN, and CS populations. Group II consisted only of 16 individuals from the LH population ( Figure 4). Individuals with a low degree of admixture were seen from all the studied populations.

Association Between SNP Markers and Environmental Variables
The association analysis of SNPs and environmental variables was conducted by the Bayenv2 and LFMM programs. Bayenv2 analysis identified a total of 43 SNP markers showing significant correlation with the environmental variables. Of these, 8, 10, and 25 markers were associated with altitude, annual rainfall, and annual average temperature, respectively (Table 6). A set of 30 markers associated with climatic variables was obtained by the LFMM program. The highest number of associations was for temperature, which was related to 16 markers; the annual rainfall and altitude were correlated with 4 and 10 markers, respectively (Table 7). Blast searches indicated that five of the correlated SNP markers could be annotated. Two markers (Marker204041 and Marker76627) associated with altitude could be annotated to the DEAD-box helicase and V-type proton ATPase genes, respectively. The SNP markers, Marker68303 and Marker129188, correlated with annual rainfall, were found in the regions of the UDP-glycosyltransferase (UGT) and peroxisome biogenesis protein genes, respectively. The SNP marker, Marker87380, associated with annual average temperature, seems to underlie the Cysteine-rich receptor-like protein kinase gene (Table 8).

Association between SNP Markers and Environmental Variables
The association analysis of SNPs and environmental variables was conducted by the Bayenv2 and LFMM programs. Bayenv2 analysis identified a total of 43 SNP markers showing significant correlation with the environmental variables. Of these, 8, 10, and 25 markers were associated with altitude, annual rainfall, and annual average temperature, respectively (Table 6). A set of 30 markers associated with climatic variables was obtained by the LFMM program. The highest number of associations was for temperature, which was related to 16 markers; the annual rainfall and altitude were correlated with 4 and 10 markers, respectively (Table 7). Blast searches indicated that five of the correlated SNP markers could be annotated. Two markers (Marker204041 and Marker76627) associated with altitude could be annotated to the DEAD-box helicase and V-type proton ATPase genes, respectively. The SNP markers, Marker68303 and Marker129188, correlated with annual rainfall, were found in the regions of the UDP-glycosyltransferase (UGT) and peroxisome biogenesis protein genes, respectively. The SNP marker, Marker87380, associated with annual average temperature, seems to underlie the Cysteine-rich receptor-like protein kinase gene (Table 8).

Discussion
The present study is the first attempt to use SNPs derived from SLAF to assess the genetic diversity and explore the adaptation mechanisms of Chinese elms. In recent years, SLAF-seq technology has become a low-cost technique to effectively develop reliable SNP and InDel markers for genome-wide association analysis and high-density genetic map construction [25,26]. Our study identified a total of 4,138,972 SNPs and selected 457,888 SNPs with MAF > 5% and a missing rate < 0.2 for further analysis. The number of molecular markers was dramatically larger than that in previous reports on elm species [27,28], which facilitates precise genetic analysis.
Heterozygosity is an important measure of overall genetic diversity [25]. In our study, the Ho and He values ranged from 0.1483 to 0.1822 (an average of 0.1599) and 0.3236 to 0.3427 (an average of 0.3315), respectively (Table 3). These values were lower than the results observed in other trees [25,29]. A relative lower level of genetic heterozygosity for the Chinese elms might be due to the existence of spatial isolation in different groups, hindering the gene communication between individuals to some extent. The index, PIC, measures the degree of informativeness of a genetic marker, with values ranging from 0 to 1 [29]. A locus with a PIC value of 0 is undesirable [30]. When PIC < 0.25, it indicates a low polymorphism, and 0.25 < PIC < 0.50 represents a median polymorphism. In contrast, PIC > 0.50 is indicative of high polymorphism [31]. According to this criteria, as the PIC values were between 0.2632 to 0.2761 (Table 3), the tested seven populations in our study possessed medium genetic diversity in terms of PIC. The inter-individual fixation index (FIS) measures the deviation of genotype frequencies from Hardy-Weinberg proportions within each population. A negative FIS indicates heterozygote excess (outbreeding), while a positive value reflects a deficiency in heterozygosity (inbreeding) [32]. In our study, the FY population presented a negative FIS (−0.03849) ( Table 3), suggesting a slight excess of heterozygotes. The other six populations exhibited a positive FIS (Table 3). All the populations were not statistically significantly deviated from the Hardy-Weinberg equilibrium (p > 0.05), indicating a relatively random mating for these populations. Overall, the FY population displayed higher genetic diversity than the other six populations, which was supported by the larger Ne, Ho, He, H, I, and PIC values (Table 3).
Outcrossing woody plants tend to possess low levels of genetic differentiation among populations [33]. In the current study, differentiation among populations was estimated by Fst values. Fst > 0.25 signifies a great genetic differentiation, 0.25 > Fst > 0.15 indicates a moderate genetic differentiation, 0.15 > Fst > 0.05 means a small genetic differentiation, and Fst < 0.05 represents negligible genetic differentiation [34]. Based on this standard, a low genetic differentiation was found among the studied populations (Fst values ranging from 0.00712 to 0.09106) (Table 4). Additionally, AMOVA analysis (Table 5) also indicated a low percentage of variation (4.24%) among populations. Similar results could be found in other trees [7,35].
Investigating the population structure of tested individuals is the premise for association analysis, since the presence of the population structure could affect the validity of association results [36][37][38]. In our study, the optimal K value of the seven populations was 1 (Figure 2), indicating no population structure existed in the studied groups. The geographic boundaries had a weak effect on the genetic structure of Chinese elms. The existence of population structure might cause correlations between unlinked locis, and would usually result in increased false associations, the weak population structure of Chinese elm in our study would be conducive to subsequent association analysis.
Natural selection has an important impact on shaping the genetic variation of a population, and therefore promotes local adaptation [39]. In this research, based on the identified SNP markers, an association study was used to uncover the hidden genetic basis of local adaptation. The associated SNP markers were blasted against public databases for putative genes. We found that the genes of DEAD-box helicase and V-type proton ATPase seemed to be candidates for adaptation to altitude. DEAD-box helicase is involved in nucleic acid metabolism functions, such as transcription, translation, replication, repair, recombination, ribosome biogenesis and splicing, which control plant grow and development [40]. V-type ATPase, as a transporter, is essential for energy metabolism and maintenance of solute homeostasis, which makes it indispensable for plant growth [41,42]. V-type proton ATPase has been shown to play a significant role in plant adaptation to stressful growth conditions [42]. We deduced that variations in altitude would lead to a difference in plant growth according to the functions of DEAD-box helicase and V-type proton ATPase.
UDP-glycosyltransferase (UGT) and peroxisome biogenesis protein were associated with annual rainfall variable. UGT belongs to the glycosyltransferase (GT) multigene family [43]. In plants, GTs are a ubiquitous group of enzymes involved in the glycosylation process, and glycosylation leads to the formation of glycosylated secondary chemicals such as flavonols, anthocyanins, and plant hormones [44,45]. Glycosylated secondary products possess increased water solubility and molecule stability, which could change their biological activity [44]. Peroxisome biogenesis protein might participate in the synthesis of peroxisomes, a metabolic organelle that exists in all eukaryotic cells [46]. Peroxisomes contribute to resistance against oxidative stresses, βand α-oxidation of fatty acids, and synthesis of ether lipids [47,48]. The products of UGT and Peroxisome biogenesis protein seemed to confer advantages for plants survival in rainy climate [43]. It is reasonable that UGT and peroxisome biogenesis protein appeared as candidates for adaptation to rainfall climate.
Cysteine-rich receptor-like protein kinase (CRK) was the putative gene that we found was associated with the annual average temperature variable. CRKs are critical signaling components that regulate plant developmental and defense processes. In Arabidopsis, overexpression of a CRK gene confers drought tolerance without affecting plant growth [49]. Considering that the temperature variable would be generally correlated with drought stress, it is possible that there may be a difference in drought-associated loci among populations. Identification of putative candidate genes correlated with the environment would reveal a primary insight into functional genes mediating local adaptation. However, further studies are required in the future to explain the accurate roles of those candidate genes in the adaptation processes of Chinese elms.

Conclusions
The present study analyzed the genetic diversity and adaptation of seven natural populations of Chinese elms in eastern China. The trees were genotyped by SLAF-seq technology, and then identification of SNPs was carried out. The natural population of Chinese elms showed a moderate level of genetic diversity (PIC = 0.2632~0.2761), low level of genetic differentiation, and a simple population structure (K = 1). The association analysis of genetic markers and environmental factors resulted in putative markers involved in local adaptation. A blast search was conducted to detect underlying putative candidate genes for the correlated markers. A total of five genes could be annotated, which were related to the functions of glycosylation, peroxisome synthesis, nucleic acid metabolism, energy metabolism, and signaling. The results will be helpful for future work on molecular breeding of this species.