Genetic Diversity and Population Structure Revealed by SSR Markers on Endemic Species Osmanthus serrulatus Rehder from Southwestern Sichuan Basin, China

: Osmanthus serrulatus Rehder (Oleaceae) is an endemic spring ‐ flowering species in China. It is narrowly distributed in the southwestern Sichuan Basin, and is facing the unprecedented threat of extinction due to problems associated with natural regeneration, habitat fragmentation and per ‐ sistent and serious human interference. Here, the genetic diversity and population structure of 262 individuals from ten natural populations were analyzed using 18 microsatellites (SSR) markers. In total, 465 alleles were detected across 262 individuals, with a high polymorphic information content ( PIC = 0.893). A high level of genetic diversity was inferred from the genetic diversity parameters ( He = 0.694, I = 1.492 and PPL = 98.33%). AMOVA showed that a 21.55% genetic variation existed among populations and the mean pairwise Fst (0.215) indicated moderate genetic population dif ‐ ferentiation. The ten populations were basically divided into three groups, including two obviously independent groups. Our results indicate that multiple factors were responsible for the complicated genetic relationship and endangered status of O. serrulatus . The concentrated distribution seems to be the key factor causing endangerment, and poor regeneration, human ‐ induced habitat loss and fragmentation seem to be the primary factors in the population decline and further genetic diversity loss. These findings will assist in future conservation management and the scientific breeding of O. serrulatus.


Introduction
Osmanthus serrulatus Rehder is an evergreen shrub or small tree with long and slender pedicels and weeping flowers that bloom in spring, which distinguishes it from other species [1,2]. Endemic to China, O. serrulatus is commonly known as Baoxing Osmanthus, and it is sporadically distributed among the high-altitude mountains around the southwest Sichuan Basin [3]. Our previous work on O. serrulatus indicated that persistent human interference and increasing habitat fragmentation were the major external factors posing a critical threat to its wild populations [3], while its characteristic functional androdioecy [4], seed germination every other year [5], and unique habitat requirements [6,7] might be the internal factors leading to a dramatic decline in the population and slow regeneration of the natural habitat. These findings play a crucial role in new Osmanthus cultivar breeding and highlight subtropical forest biodiversity patterns and mechanisms in China. However, research on O. serrulatus, especially genetic studies [8], remains limited. Despite research efforts in regard to the phylogeny and evolution [9,10], genetic diversity and population structure [11][12][13][14] of O. fragrans (Thunb.) Lour., there is an urgent need to explore the genetic relationships of O. serrulatus with neutral markers, e.g., microsatellites, to obtain a better understanding of its genetic diversity and population structure.
As the basis of diversity in species and ecosystems, genetic diversity is an inherent form of biodiversity [15], reflecting the ability of organisms to adapt to the changing environment through natural selection [16,17]. Generally, populations with little genetic variation are vulnerable to the arrival of climate change and habitat destruction, while those of high genetic diversity can withstand the rigorous natural selective pressures [17,18]. It is difficult for rare and endangered species to expand their distribution range and explore new areas for survival because the cumulative loss of genetic variation leads to a continuous reduction in population size [17,19,20]. As an expression vector of genetic traits, the protection of genetic resources is actually to protect the evolutionary potential and environmental adaptability of different species and individuals [21,22]. Exploring genetic diversity and population structure enables an evaluation of the evolutionary potential and adaptability and the prediction of future changes in genetic composition [23]. Genetic conservation is a fundamental task for species preservation and plant germplasm innovation [24,25].
To date, microsatellites (simple sequence repeats, SSRs) are some of the most frequently used molecular markers to assess genetic diversity and population structure, due to their co-dominance inheritance, high polymorphism, high stability and repeatability, rich content distribution, and low cost of detection [19,26]. In this study, we used 18 biparentally inherited polymorphic SSR markers to evaluate the genetic diversity and structure of ten natural populations throughout the current main distribution range. The results will provide molecular evidence for the conservation and utilization of O. serrulatus germplasm.

Plant Material and DNA Extraction
Based on the historical data in the literature and for specimens of O. serrulatus, the geographic distribution and location information for this species were synthesized ( Figure  1A). Through extensive field investigation over the past few years, ten natural populations from southwestern Sichuan, China, were located and identified as from a wild provenance ( Figure 1B). Fresh leaf samples were collected from 262 individuals of the ten populations (Table 1) and immediately dried in silica gel. Total genomic DNA was extracted from dried leaves using the Plant Genomic DNA Kit (Tiangen, Shanghai, China) according to the manufacturer's instructions. The quality and concentration of DNA were evaluated using a NanoDrop TM 2000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA). Extracted DNA samples were adjusted to 50 ng/μL and subsequently stored at −20 °C.  Table 1.

Primer Screening and PCR Amplification
A total of 100 SSR primers from the developed primers based on high-throughput transcriptome sequencing data [8,27] were randomly selected and 15 primers with stable polymorphic bands were screened by the DNA mixed pooling method [28]. Then, the 15 specific primers with three universal primers (OSM005, OSM010, OSM038) [29] were used to estimate the genetic diversity and structure of the natural O. serrulatus populations (Table S1). The SSR-PCR amplification was performed on the Applied Biosystems TM Automated Thermal Cycler (Thermo Scientific, Wilmington, DE, USA) as follows: a pre-denaturation at 94 °C for 5 min; followed by 30 cycles of 4 s denaturation at 94 °C, 30 s annealing at 55 °C and 30 s extension at 72 °C; a final extension at 72 °C for 5 min; and then preservation at 4 °C [28]. The reactions were conducted in a 25 μL volume containing 50 ng of template DNA, 1.0 μL of 100 μmol/L of each forward and reverse primer, 12.5 μL of 2× Taq PCR Master Mix, and 9.5 μL ddH2O. The PCR products were analyzed on 8% polyacrylamide gel and silver-stained to detect SSR bands, and the length of the bands were compared with a 50bp DNA ladder (Tiangen, Shanghai, China) to estimate allele size.

Data Analysis
The genetic diversity parameters were calculated by GenAlEx 6.502 [30] and POPGENE 1.3.2 [31], including the number of observed alleles (Na), number of effective alleles (Ne), Shannon's information index (I), the mean number of private alleles (Np), observed heterozygosity (Ho), expected heterozygosity (He), percentage of polymorphic loci (PPL), Wright's fixation index (F), inbreeding coefficient among individuals (Fis), genetic differentiation coefficient among populations (Fst), gene flow (Nm), and the deviation from Hardy-Weinberg equilibrium (HWE) with the chi-squared test for each population. The polymorphic information content (PIC) and null alleles frequency (Fn) were computed by Cervus 3.0.7 [32]. Bottleneck 1.2.02 [33] was employed to test for the recent bottleneck events using the one-tailed Wilcoxon sign-rank test with the infinite alleles model (IAM, 30%), the stepwise mutation model (SMM, 70%) and two-phase model (TPM), and a further mode-shift test was also performed. GenAlEx 6.502 was also used to calculate the pairwise Fst, analysis of molecular variance (AMOVA), principal coordinate analysis (PCoA), and Mantel test based on the pairwise matrix of geographic distance and Nei's genetic distance in terms of Fst/(1-Fst) with 1000 permutations. The unweighted pair group method with arithmetic mean (UPGMA) clustering based on Nei's genetic distance among the populations and among the individuals were constructed using MEGA X [34] and visualized by iTOL v6.3 (Interactive Tree of Life) [35]. The population structure was determined with a Bayesian clustering analysis using STRUCTURE v2.3.4 [36]. A total of ten independent simulations for each K ranging from 2 to 10 were performed with a burnin period of 100,000 steps followed by 100,000 Markov Chain Monte Carlo (MCMC) iterations using the Admixture Model. The most probable number of population groups (K) was determined with delta K (ΔK), following Evanno et al. [37] by web-based STRUCTURE HARVESTER [38]. Repeated sampling analysis and genetic structural plots were analyzed by CLUMPP 1.1.2 [39] and visualized by DISTRUCT 1.1 [40]. The associated membership probability value (Q value) of different genetic clusters in each individual was represented by vertical colored column.

SSR Loci Polymorphism
Across the 18 SSR loci, a total of 465 alleles were detected among 262 individuals from ten populations, and the Na detected per locus ranged from 16 to 41, with an average of 25.83 (Table S2). The mean Ne was 10.748, and ranged from 8.651 (Os48) to 16.837 (Os87). All the values of Ho of 18 SSR loci were much smaller than He, except Os35. The mean Ho was 0.279, and ranged from 0.013 (Os74) to 0.907 (Os35), while the mean He was 0.694, and ranged from 0.602 (Os74) to 0.831 (Os87). The values of PIC and I averaged 0.893 and 1.492, respectively, indicating a high polymorphism and suitability for the genetic analysis. The mean F of 18 SSR loci was 0.623, ranging from 0.480 (Os72) to 0.887 (Os03). The values of Fis and Fst were 0.602 and 0.228 on average, respectively, and the loci Os35 had the minimum values, including a negative inbreeding coefficient (Fis = −0.111). In addition, the mean values of Nm and Fn were 0.969 and 0.556, respectively, and the HWE test across the 262 individuals showed significant deviations for all loci.

Population Genetic Diversity and Mutation-Drift Equilibrium
At the population level, the genetic diversity index was estimated across the ten populations (Table 1). Overall, the average percentage of polymorphic loci (PPL) was 98.33%, with almost all populations at 100% except EMS (94.44%) and BCP (88.89%). The Na per population ranged from 4.167 (MPZ) to 8.500 (YCP), with an average of 6.894, while the mean Ne was 4.381, ranging from 3.062 (MPZ) to 5.532 (DLS2). The values of Ho and He ranged from 0.136 (MPZ) to 0.373 (DLS1) and from 0.621 (WZX) to 0.774 (DLS2), respectively. The mean I was 1.492 over a range of 1.184 (MPZ) to 1.779 (DLS2). The mean Np was 1.133, and ranged from 0.333 (BCP) to 3.056 (YCP). In addition, the average fixation index (F) was 0.623, ranging from 0.469 (YCP) to 0.813 (MPZ) at the population level, indicating a severe inbreeding and heterozygote deficiency among the natural populations.
In bottleneck analysis, heterozygosity excess based on the Wilcoxon sign-rank test was highly significant (p < 0.01) for the IAM model in almost all populations except WZX and for the TPM model in populations EMS, BCP, HZP, DLS2, MPZ, and XLXS, while not it was significant in most populations except BCP (p < 0.01) and MPZ (p < 0.05) with the SMM model (Table S3). Meanwhile, the mode-shift test did not detect any significant distortion in the allele frequency with a normal L-shaped distribution indicating the absence of bottleneck in O. serrulatus populations ( Figure S1).

Genetic Differentiation and Gene Flow between Populations
The genetic variation in populations was determined both by AMOVA (Table 2) and pairwise Fst (Table S4). AMOVA demonstrated that only 22.15% of the total molecular variance occurred among populations, while the remainder occurred within populations with 46.84% and 31.61% existing among and within individuals, respectively (p < 0.001). This was confirmed by small Fst and large Nm overall among the populations, representing 0.215 and 0.910 on average, respectively. In total, the pairwise Fst ranged from 0.123 to 0.303, of which most values were between 0.15 to 0.25, and the pairwise Nm ranged from 0.575 to 1.799, with most values around 1.0. Particularly, the smallest Nm and biggest Fst appeared between EMS and BCP, while the biggest Nm and smallest Fst appeared between XLXS and DLS2.

Population Clustering and Genetic Structure
Based on Nei's genetic distance, UPGMA and PCoA were used to access the population clustering. UPGMA clustering analysis intuitively divided the ten populations into three main groups (Figure 2A) based on a genetic distance threshold of 0.70. Specifically, the population EMS alone formed Group I (red), and population DLS1, DLS2 and XLXS clustered into Group Ⅱ (green), while Group Ⅲ (blue) contained the remaining six populations (YCP, HZP, XYG, WZX, MPZ and BCP). Overall, the result of PCoA (Figure 3) was similar to UPGMA, which approximately divided the individuals of ten populations into three groups. A further UPGMA-based cluster analysis among the 262 individuals was performed to reveal the specific relationship and genetic structure ( Figure 2B). The resulting tree showed that the individuals could be divided into five groups. Group I (red cluster) and Ⅱ (green cluster) were consistent with the UPGMA result among populations. The other individuals were separated into three groups. A set of individuals (14) from population HZP was gathered into a small Group Ⅲ (brown cluster), individuals of MPZ and BCP were gathered into another Group Ⅳ (purple cluster), while the remaining individuals were gathered into a complex Group Ⅴ (yellow cluster).  In the population structure analysis, there were two peaks of ΔK (the maximum for 4.853 and the second for 4.120) ( Figure S2A), indicating that the entire population could be divided into five or seven clusters ( Figure S2B). The genetic composition may be from one or several sub-populations, and the cluster membership probability of each individual represents the genetic relationship among individuals. Every component in an individual takes a certain proportion, which represents the membership probability value (Q value). When the Q value was over 75%, the individual was usually considered as pure, otherwise they are admixed [41]. There were 145 pure individuals and 117 admixed individuals when K = 5, and 199 pure individuals and 63 admixed individuals when K = 7. Nearly all populations except XYG were composed of mostly pure individuals when K = 7, while only half of the populations (EMS, YCP, HZP, DLS2 and XLXS) were pure when K = 5. Among all, the shared genetic origins of sub-populations indicated a frequent gene exchange among the populations, such as BCP, MPZ and WZX, and DLS1, DLS2 and XLXS. The population EMS (red), HZP (orange) and WZX (brown, when K = 7) were isolated as a unique genetic cluster with pure individuals, indicating a certain genetic specificity. Meanwhile, the cluster membership proportion of each population at K = 5 and K = 7 was mapped based on Q values (Figure 4). The resulting chart showed a relatively clear geographical distribution among populations. Additionally, the Mantel test revealed no significant correlation between Nei's genetic distance and geographical distance (R 2 = 0.0001, p ≤ 0.490; Figure S3) and also confirmed that genetic differentiation among populations was not caused by geographic isolation. More broadly, the clustering structures were not in conflict with the results of UPGMA and PCoA, because the population EMS, and the populations DLS1, DLS2 and XLXS seemed to have the same genetic origins in order to gather into independent groups.

Genetic Diversity of Osmanthus serrulatus
Generally, the PIC value is used as an indicator for the genetic informativeness of markers among the accessions [42]. As shown in this study, the highly polymorphic markers with a mean PIC value of 0.893 were very suitable for revealing the genetic diversity of the ten natural populations of O. serrulatus from southwestern Sichuan, especially the primer Os87. There was a high level of genetic diversity shown by the high values of expected heterozygosity (He > 0.6) and Shannon's information index (I > 1.0) of O. serrulatus (Table S2). Population size does indeed affect genetic diversity [43]. Among the populations, DLS2 with a larger local population size had the highest level of genetic diversity (He = 0.774, I = 1.779, Table 1), and its remote habitat with very little human interference might result in greater genetic diversity and lead to more intraspecific genetic variation in recent history. At the same time, the ex situ population MPZ with less samples had the lowest level of genetic diversity, and habitat changes and severe human disturbance including damage during transplanting might be other factors affecting the genetic diversity.
The mating system, that is, functional androdioecy, and the contrasting levels of gene flow might explain the high genetic diversity of this narrowly distributed species in some way [44,45]. The presence of null and dropout alleles will lead to a high deviation between Ho and He, and a high frequency of null alleles will cause the deviation from HWE in all loci [42,46]. In this study, all values of Ho were much smaller than He among the ten populations as well as among the 18 loci (except Os35), which may be caused by the high frequency of null alleles at all loci (0.017-0.966, mean Fn = 0.556), and further lead to an underestimation of population genetic diversity [46]. Meanwhile, the significant HWE deviation (p < 0.001) might present signs of inbreeding, which suggests high heterozygosity deficiency and nonrandom mating among individuals [47].
In general, the genetic diversity of plants is positively correlated to the geographical range and population size, and the population decline leads to a loss of genetic diversity [43,45,48]. Endangered, endemic and narrowly distributed species usually have low genetic diversity, but many studies have shown that even endangered plants can have a moderate or high level of genetic diversity, such as Michelia coriacea Hung T. Chang [53]. In addition, in contrast with the previous study on wild Osmanthus species using SSR markers, the observed genetic diversity of O. serrulatus was close to the high-mountain-endemic species, O. delavayi Franch. (He = 0.723, I = 1.657) [54] and widespread species, O. fragrans (He = 0.673) [13], but much higher than O. cooperi Hemsl. (He = 0.1419, I = 0.5559) [55], a widespread but sporadic species, indicating relatively stronger environmental adaptability and evolutionary potential.
Additionally, the genetic bottleneck effect was tested under the IAM, SMM and TPM models. The transient effect of population decline is usually measured by heterozygosity excess [56]. In bottleneck analysis, the IAM is usually recommended for allozyme data and microsatellites tend to evolve under the SMM and TPM models [33]. However, SMM is generally more appropriate for long-term dynamic analysis of large populations and only a few microsatellite loci strictly conform with it [57]. Consequently, the TPM is apparently even more appropriate than IAM and SMM in this study, and the populations EMS, BCP, HZP, DLS2, MPZ and XLXS might actually have experienced bottleneck events in recent history (p < 0.01, Table S3).

Genetic Differentiation and Population Structure of Osmanthus serrulatus
Population structure was manifested mainly as genetic differentiation among populations [58]. Nybom [53] proposed that the genetic variability of species with long-lived or outcrossing is mostly retained within populations, while in annual or selfing species, variability is mainly allocated among populations. In the present study, AMOVA results showed that only 21.55% genetic variations existed among O. serrulatus populations, suggesting that genetic variation mostly occurred within populations. According to Wright [59,60], Fst has usually been used for evaluating the genetic differentiation among populations with the following rules: Fst < 0.15, low; 0.15 ≤ Fst < 0.25, medium; Fst ≥ 0.25, high. The larger Fst and smaller Nm indicated greater differences among populations. Similar mean values of moderate genetic differentiation and medium gene flow were detected for all populations across the 18 SSR loci (Fst = 0.228, Nm = 0.969, Table S2) and between the populations (Fst = 0.215, Nm = 0.910, Table 2). Particularly, the pairwise Fst of EMS-associated populations was mostly more than 0.25, highlighting a high differentiation between these populations. Moreover, Nm was one of the main factors for homogenization, given that gene exchange between populations could counteract the genetic differentiation caused by genetic drift and geographical isolation when Nm > 1.0 [61]. Most pairwise Nm around 1.0 indicated a relatively frequent gene flow among O. serrulatus populations, suggesting the Nm partly replaced the effect of genetic drift or geographical isolation between some populations.
For population structure classification, the analysis of PCoA, UPGMA and STRUCTURE complement and validate each other, because PCoA and UPGMA only expound intuitive relationships without fully categorizing them, while STRUCTURE can categorize the populations objectively and determine the probability and degree of genetic exchange among different populations [62]. In the present study, we obtained five or seven sub-populations from the ten O. serrulatus populations using STRUCTURE ( Figure  S2). Overall, the EMS population, and the populations, DLS1, DLS2 and XLXS could definitely be separated into two independent groups. They had a relatively distinct genetic composition, whereas the remainder showed marginal differences based on the three methods, indicating complicated genetic relationships among different populations and different locations. Especially, population EMS and HZP were isolated as a unique genetic cluster with pure individuals whether K = 5 or K = 7, indicating a certain genetic specificity and limited gene exchange with others. Furthermore, some populations were geographically adjacent but quite different in genetic structure, such as EMS and BCP, HZP and YCP (Figure 4), suggesting no correlation between genetic structure and geographical distribution. This was also confirmed by the Mantle test (R 2 = 0.0001, p ≤ 0.490, Figure S3). As for the individuals, most individuals were pure in theory, whereas most populations had moderate gene flow. This may be explained by the patchy distribution and the mating system of O. serrulatus [4], i.e., the geographic barriers of altitude and distance restricted the gene exchange between different populations to some extent, and the functional androdioecy strengthened mating fitness within populations. According to the current results and the biological characteristics [4,5] and the demands of special habitats [6], we believe it is very likely that there is no population differentiation of O. serrulatus across the local area, and that the concentrated distribution might be the key factor in the endangerment of O. serrulatus.

Conservation Proposal of Osmanthus serrulatus
Sweet Osmanthus is one of the top ten traditional famous flowers in China. As a rare spring-flowering endemic Osmanthus species, O. serrulatus occupies an important phylogenetic node in genus Osmanthus, and is a crucial germplasm resource for genetic breeding of Sweet Osmanthus. Like rare and endangered species, endemic species preservation is challenged by habitat destruction and fragmentation [63]. Previous field studies have showed that the existing wild habitats of O. serrulatus populations were isolated in some fragmented, independent and narrow regions in high-altitude mountain areas in Sichuan. At the same time, a lot of historical sampling sites have disappeared due to land clearing, grazing and destructive exploitation. Some initial characteristics of Plant Species with Extremely Small Populations (PSESP) are associated with O. serrulatus, such as a decline in population size, natural regeneration is difficult, habitat fragmentation, and persistent and serious human interference [3,4]. Therefore, effective and efficient conservation practices should be put in place as soon as possible to reduce the population decline. Considering the high genetic diversity and moderate genetic differentiation of the wild O. serrulatus populations, human-induced habitat loss and poor regeneration are mainly responsible for the decline in the population size, while the accompanying genetic erosion and genetic drift might influence the genetic diversity in subsequent generations.
Maintenance of genetic diversity is a primary objective in the management of wild populations of threatened species [58]. Based on the results of present and previous work, both in situ and ex situ efforts on the original habitat and genetic germplasm are needed in order to formulate a conservation strategy for O. serrulatus [48]. Since genetic variation mainly occurs within populations, it is vital that preferential in situ conservation is implemented for populations with high genetic diversity and individuals with unique genes, such as populations DLS2, XLXS, YCP, and HZP. Simultaneously, it is necessary to introduce individuals from different gene pools into other gene pools to improve the reproductive fitness and maintain genetic diversity of different populations [58], especially the pure individuals of EMS and HZP. Additionally, ex situ conservation of as many individuals as possible via seeds or branches could be used as an auxiliary measure to establish a germplasm resources library to preserve the genetic diversity, by providing sufficient materials for germplasm innovation and possible reintroduction of this endemic species in the future.

Conclusions
This is the first study to perform an analysis of the genetic diversity and population structure of O. serrulatus in southwestern Sichuan using 18 SSR markers. Our results show high adaptability and evolutionary potential in the natural O. serrulatus populations. There was a high level of genetic diversity within populations, and moderate genetic differentiation among populations. The ten populations were mainly divided into three groups, with two distinct, independent groups and a complex mixed group. There was no correlation between genetic structure and geographical distribution. The geographic barriers of altitude and distance restricted the gene exchange between different populations to some extent, but only a moderate genetic differentiation occurred between populations. Multiple factors attributed to the complicated genetic relationship and endangered status O. serrulatus. Poor regeneration and human-induced habitat loss and fragmentation are mainly responsible for the decline in population size and genetic diversity, and the concentrated distribution might be the key cause of endangerment. Our study sheds light on the genetic diversity, population structure and genetic relationships among populations, and provides important information on the conservation strategy development, genetic improvement and scientific breeding of O. serrulatus.  Figure S3: Mantel test for matrix correlation between Nei's genetic distance and geographical distance of ten O. serrulatus populations Table S1: Characteristics of 18 polymorphic SSR loci in O. serrulatus, Table S2. Polymorphsim parameters of 18 SSR loci in O. serrulatus, Table S3. Summary of population bottleneck tests with four models (IAM, TPM, SMM, and Model-shift) in ten O. serrulatus populations, Table S4. Genetic differentiation coefficient Fst (below diagonal) and gene flow Nm (above diagonal) values between O. serrulatus populations across 18 loci.