Population Genetic Structure Analysis Reveals Significant Genetic Differentiation of the Endemic Species Camellia chekiangoleosa Hu. with a Narrow Geographic Range

: In order to protect and utilize the germplasm resource better, it is highly necessary to carry out a study on the genetic diversity of Camellia chekiangoleosa Hu. However, systematic research on population genetics analysis of the species is comparatively rare. Herein, 16 highly variable simple sequence repeat (SSR) markers were used for genetic structure assessment in 12 natural C. chekiangoleosa populations. The genetic diversity of C. chekiangoleosa was low (h = 0.596), within which, central populations (such as Damaoshan (DMS), Sanqingshan (SQS), and Gutianshan (GTS)) at the junction of four main mountain ranges presented high diversity and represented the center of the C. chekiangoleosa diversity distribution; the Hengshan (HS) population in the west showed the lowest diversity, and the diversity of the eastern and coastal populations was intermediate. C. chekiangoleosa exhibited a high level of genetic differentiation, and the variation among populations accounted for approximately 24% of the total variation. The major reasons for this situation are the small population scale and bottleneck effects in some populations (HS and Lingshan (LS)), coupled with inbreeding within the population and low gene flow among populations (Nm = 0.796). To scientifically protect the genetic diversity of C. chekiangoleosa , in situ conservation measures should be imple-mented for high-diversity populations, while low-diversity populations should be restored by re-introduction.


Introduction
The genus Camellia L. includes many valuable shrubs, economic trees, and endangered species belonging to the family Theaceae, among which Camellia japonica L. is an internationally known ornamental flower [1]. In addition, C. oleifera Abel., a species of oil tea camellia, is considered one of the four major woody oil tree species [2]. Oil tea camellia generally refers to a tree species in the genus Camellia with a high oil content and high production and cultivation value [3]. In China, oil tea camellia has a cultivation history of more than 2,300 years, and it has been used as an oil tree species for more than 1,000 years [4]. Currently, there are more than 20 species of oil tea camellias that are mainly planted in China [5]. As the two major oil tea camellia species, C. oleifera and C. chekiangoleosa Hu. have been usually cultivated in southern China [6]. However, compared with the former, molecular basis for the protection of C. chekiangoleosa natural resources and provide a technical reference for their rational utilization.

Population Sample Information
Samples were taken from 12 existing natural populations of C. chekiangoleosa within a certain scale in 4 provinces within the Jiangxi, Zhejiang, Fujian, and Hunan Provinces (Supplementary Materials Table S1). An appropriate sample size was selected according to the population size, but at least 30 single plants were sampled in each population; the distance between individuals was more than 20~30 m; the sampled plants showed normal growth without diseases or pests. Young tissue samples (young leaves, buds, or flower buds) were collected and stored via the silica gel drying method. Based on the longitude and latitude of the sampling points indicated by a GarminGPS60 device (WGS84 coordinate system), and the sampling map ( Figure 1) was drawn by using Global Mapper [25]. The distribution of the geographic information and sample quantities at each sampling point (correspondence between population name and code) are also shown in Table S1.

Experimental Methods
Total DNA was extracted via the cetyl trimethylammonium bromide (CTAB) method [26], and a NanoDrop-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) was used to detect the concentration of DNA. After electrophoresis in 0.8% agarose gelatin, the integrity of the purified DNA was determined, and qualified DNA samples were stored at −20 °C .
In this study, 16 pairs of highly polymorphic EST-SSR primers (Supplementary Materials Table S2) for C. chekiangoleosa were selected out from our previous research [11]. All primers were synthesized by Shanghai Generay Biotech Co., Ltd. The polymerase chain reaction (PCR) system had a volume of 10 μL, which included 1 μL of 10 × PCR buffer, 1.5 mM Mg2+, 100 μM dNTPs, upstream and downstream primers at 0.5 μM, 1.25 U of Taq polymerase (5 U/µ L), and 50 ng of DNA. The amplification reaction procedure was as follows: pre denaturation at 94 °C for 5 min; 30 cycles of 94 °C for 30 s, 55 °C for 30 s, and 72 °C for 30 s; a final extension at 72 °C for 1 min; then, storage at 8 °C . During the test, the annealing temperature was increased or decreased according to the melting temperature (Tm) of each primer. The PCR products were uniformly added to 8% polyacrylamide gels, along with a 50 bp labeled fragment size marker (TIANGEN, Beijing, China); standard electrophoresis was performed at a 120 V constant voltage for 2 h; the product was detected by silver staining. Digital photographs were recorded, and Quality One software (Bio-Rad Laboratories Inc., California, USA) was used to quantitatively analyze the fragment size of each detection site according to the standard molecular weight of the 50 bp marker; finally, the data were finally counted.

Data Analysis
The electrophoretic bands were read according to codominant markers, and the sizes of the allele fragments at each point were recorded. The data were imported into Excel according to the requirements. The polymorphism information content index (PIC) and linkage disequilibrium were calculated by using PowerMarker 3.25 (bioinformatics research center; North Carolina State University, Raleigh, NC, USA) [27], and Bonferroni correction was performed to adjust the P value. Popgene 1.32 software [28] was used to calculate the following: number of observed alleles (Na); number of effective alleles (Ne); Nei's gene diversity (h); the Shannon diversity index (I); observed heterozygosity (Ho); expected heterozygosity (He); the inbreeding index (F). The genetic differentiation index (GST, based on He; FST, based on an infinite allele model, IAM; RST, based on a stepwise mutation model-SMM), gene flow (Nm = (1−FST)/4FST), and allele richness (AR) were calculated across all populations at each locus and over all loci using FSTAT 2.9.3 [29].
The genotype data format was transformed by using Genelex 6.4 [30]; an analysis of molecular variance (AMOVA) was carried out for the sources of different levels of genetic variation within and among populations using Arlequin 3.5 [31]. The results of the analysis of the genetic relationships between populations were displayed by factorial correspondence analysis (FCA) with the analysis software Genetix 4.05 (CNRSUMR 5000; Universite Montpellier II, Montpellier, France) [32]. The isolation by distance pattern (IBD) was detected by Mantel tests with 1000 permutations based on matrices of pairwise genetic differentiation (Multilocus estimates of genetic differentiation, Fst, expressed as Fst/(1−Fst)), and geographic distance among populations were performed in NTSYS 2.10e [33][34][35]. STRUCTURE 2.3.1 software [36] was used to infer the population genetic structure with the allelic model. Moreover, the number of distinguishable groups was initially set between k = 1 and k = 15, and the false positioning points were independent. The length of the burn-in period at the beginning of the Markov chain Monte Carlo simulation algorithm (MCMC) was set to 50000 times, and the MCMC after non-counting iteration was then set to 50000 times for the estimation of the α value to determine whether a separate population existed (α ≤ 0.2 was the standard for the classification of the test samples). A suitable value of was selected according to the size of △K. △K is the difference in LnP(D) based on adjacent K values. Finally, Bottleneck 1.2 software [37] was used to determine whether each group had experienced a bottleneck effect in its recent history. A two-phase mutation model (TPM) and a stepwise mutation model (SMM) were used for Wilcoxon signed rank tests. The parameter settings included a 90% SMM and 10% TPM with 12% variation in TPMs and 1000 repeats.

Genetic Diversity
Using 16 pairs of primers to detect 528 individuals in 12 geographical populations of C. chekiangoleosa (Table 1), a total of 74 alleles were obtained. The amplified alleles of a single primer ranged from 3 to 7. The average and effective allele values were 4.625 and 2.682, respectively. The highest amplified effective allele value obtained was for CC_eSSR03 (3.311), and the lowest obtained was for CC_eSSR16 (1.545). The average PIC value of the 16 loci was 0.616, reflecting a high polymorphism level, among which CC_eSSR16 showed the lowest PIC (0.352), and CC_eSSR03 showed the highest (0.698). These results were consistent with the effective equivalent gene comparison result, and the average h at the species level reached 0.596.  The results of the genetic diversity analysis of each geographical population of C. chekiangoleosa are shown in Table 2. Overall, the genetic diversity levels of different geographical populations of C. chekiangoleosa were quite different. AR ranged from 1.625 (HS) to 3.982 (GTS), with an average of 3.098, and the average He was 0.458. The SQS population presented the highest level of genetic diversity and the highest value of He, at 0.575. The lowest He value was found in the HS group, at 0.160. The 12 populations were ranked from high to low, according to the values He, as follows: SQS > DMS > GTS > LS/RLX > WYS > XP > tr > JRS > WYL > FA > HS. The diversity of LS and RLX was similar. The diversity parameter I ranged from 0.290 to 1.029 in the populations, with an average of 0.787. Both He and I show that the diversity of each population was maintained at an intermediate level. The results for the 12 populations sorted according to He and I were different, but the highest values appeared among DMS, SQS, and GTS, and the lowest values were obtained in the HS population, which also showed the lowest percentage of polymorphic loci among the 12 populations (43.75%). The above results indicate the reliability of genetic diversity evaluation.
The fixed index (F), also known as the inbreeding coefficient, can reflect the gene flow and inbreeding of the population. The F value of the natural C. chekiangoleosa population in this study was greater than 0, and the F value of the HS population was 0.087, which was the closest value to 0. The fixed indexes (F) of the RLX population and WYL population were relatively large, at 0.345 and 0.360, respectively, but those of the other populations were lower; the average F was 0.205, which generally indicated that the population deviated from the Hardy-Weinberg equilibrium, as reflected in the excess of homozygotes (Table 2).

Genetic Differentiation
The level of population genetic differentiation was analyzed based on Nei's genetic diversity, IAM, and SMM. The statistics showed that the average GST, FST, and RST were 0.234, 0.239, and 0.253, respectively. It indicated that there was a high level of genetic differentiation among the C. chekiangoleosa populations (Table 1). Additionally, the AMOVA of the components within and among populations performed with Arlequin 3.5 software also showed that there was significant genetic variation (p < 0.001), where 24.15% of the genetic variation existed among populations, while 75.85% of the genetic variation occurred within populations (Table 3). In addition, the gene flow (Nm), estimated based on FST, was consistent with that estimated based on GST (Table 1; Supplementary Materials  Table S3)  In this study, the TPM model and the SMM model of the Wilcoxon signed rank test [37] were used to analyze the bottleneck effect. According to the data (see Table 2), with the exception of the HS and LS groups, which recently experienced bottleneck effects (p < 0.05), no other groups experienced bottleneck effects. According to 3D graphics of factorial correspondence analysis, the 12 populations were divided into 3 groups: Group I, Group II, and Group III. Group I was the western cluster, and only contained HS groups; Group II was a central cluster with DMS at its core, which was distributed on the ridge connecting the Huaiyu Mountain Range to Wuyi Mountain Range in the south, and included the LS, GTS, SQS, DMS, RLX, WYS, and LRS populations; Group III was an eastern coastal cluster with WYL at its core, including the WYL, ZR, FA, and XP populations ( Figure 2). There were certain genetic differences among the groups, especially between the HS population and the other populations.
In addition, the correlation analysis between genetic differentiation (FST/1-FST) and spatial geographical distance was conducted (Supplementary Materials Table S3) to check whether isolation by distance (IBD) existed. The correlation analysis showed that the correlation between the 2 groups was very significant (r = 0.763, p < 0.001), indicating that geographical isolation is the main cause of genetic differentiation (Figure 3).

Genetic Structure
The genetic structure of the 12 studied populations was inferred using the STRUC-TURE software. The most probable division with the highest ΔK value was detected at K = 10 (Supplementary Materials, Figure S1a,b). Figure 4 showed the genetic structure of the 12 C. chekiangoleosa populations, simulated based on the mathematical model. Each color represents a genetic cluster, and the colored segments show the individual's estimated ancestry proportion to each of the genetic clusters. The 12 populations were clustered into 10 clusters. The DMS population presented the most gene introgression from the GTS, WYS, SQS, LS, and XP populations. The GTS population presented the most gene introgression from the WYS population and LS population. However, the ZR, FA, and WYL populations were mixed, and the degree of differentiation between these populations was extremely weak. Genes from the XP and LS populations mainly infiltrated the mixed group composed of the last three populations. In the SQS, LS, and XP populations, there were also some individuals with similar genetic backgrounds to the GTS population, and the HS population was the most independent.

Evaluation of Genetic Diversity in Camellia chekiangoleosa
Previous research has shown that the geographical distribution range of a species often presents a strong correlation with its genetic diversity [23]. Especially, compared with widely distributed plant species, species with narrow geographical ranges or endangered species theoretically show lower genetic diversity [38]. In this study, C. chekiangoleosa showed low genetic diversity (h = 0.596) at the species level, that was higher than the average He (when the calculation was based only on polymorphism and double allele loci, this parameter was equivalent to Nei's genetic diversity h) of endangered species (0.420) but lower than that of widely distributed species (0.620) [17]. The results are consistent with previous studies. The diversity level according to similar markers was even lower than that in other widely distributed related species. A total of 96 pairs of EST-SSR primers were used to evaluate the genetic diversity of C. sinensis in the main production areas of China, and the value of h reached 0.64 [39]. When 4 pairs of SSR molecular markers were used to evaluate the genetic diversity of the ancient C. japonica population, the He value reached 0.84 [40], which was higher than the genetic diversity of the wild populations of C. chekiangoleosa. Meanwhile, the assessment of population genetic diversity can reveal the species adaptability to the different environment [41]. Among the C. chekiangoleosa populations, the SQS, DMS, LS, and GTS populations, with high genetic diversity, are concentrated in the area linking the Wuyi Mountain Range population and the Huaiyu Mountain Range population in the north, which we believe is suitable establish areas for C. chekiangoleosa. In addition, because of the barrier imposed by the Heng Mountains, the diversity of the HS population in the west was lowest under the pressure accumulation of environmental selection.
It is helpful to analyze the reasons for the low genetic diversity of C. chekiangoleosa especially when compared with related species. In the genus Camellia, some endangered tree species with narrow distributions also show low genetic diversity [42]. When researchers evaluated the diversity of 12 wild populations of the narrowly distributed tree species C. huana with chloroplast fragments and 12 pairs of SSR primers, the obtained He, based on SSRs, was 0.466 [22]. Chen [20,44]. The genetic diversity of the above related species was thus shown to be similar or slightly higher than that recorded in our study. This is related to the fact that most populations of C. chekiangoleosa are not only affected by land loss but are also seriously disturbed by human beings. In addition, certain inbreeding phenomena were identified in each population, and the F values of the WYL and RLX populations reached 0.36 and 0.345, indicating that excess homozygosity is one of the internal reasons for the reduction in genetic diversity [45]. Some individual populations, such as the HS population, have experienced bottleneck effects and genetic drift, resulting in the loss of genetic diversity.

Population Genetic Structure Analysis Reveals Significant Differentiation of Populations
Compared with the genetic differentiation coefficients of 106 plant species based on SSR molecular markers [17], the genetic differentiation level of C. chekiangoleosa populations (FST = 0.239) was close to the average level of narrowly distributed species (FST = 0.23), but lower than widely distributed species (FST = 0.25). Therefore, C. chekiangoleosa presents a high differentiation level as a narrowly distributed species. AMOVA showed that 24.15% of the observed variation occurred among the populations of C. chekiangoleosa, which was similar to findings in the related species C. huana with a narrow distribution [23]. The result also reflected the fact that the population variation of C. chekiangoleosa was mainly attributed to the variation between individuals within the population. Moreover, our results (GST = 0.234) are lower than previous studies on the genetic differentiation of C. chekiangoleosa (GST = 0.3758) [12], which may be caused by different markers, and the population chosen in our study had higher coverage density.
Natural selection is the most important evolutionary driving force leading to population differentiation, while gene flow is an important factor hindering population differentiation [46]. In addition to the distribution characteristics mentioned above, the diffusion mechanism of plant pollen and seeds also impacts the average gene flow between populations [47]. The seeds of C. chekiangoleosa exhibit similar large and heavy traits to C. japonica and C. flavida (H.T. Chang), which are mainly transmitted by gravity [40,48]. C. chekiangoleosa is mainly distributed in high mountains at altitudes of 600~1400 m and is pollinated by insects. Therefore, mountains hinder the diffusion of seeds and pollen to a certain extent, thus limiting the spread of seeds and pollen [49]. The FST-based gene flow value (Nm) of C. chekiangoleosa was only 0.796, which is in line with the above statement. Slatkin concluded that, if the Nm of migrating individuals per generation is less than 1, genetic drift will become the dominant factor dividing the genetic structure of a population [50,51]. In this study, there was a significant positive correlation between spatial distance and genetic differentiation that reflected a significant geographical isolation effect among the populations of C. chekiangoleosa, which was consistent with the division of population genetic structure caused by lower gene flow [51].
The small-scale distribution of genetic variation within populations, and the largescale spatial distribution among populations, are two of the important characteristics of population genetic structure [41]. This study showed that 12 geographical populations of C. chekiangoleosa were divided into 10 groups, and there was gene infiltration among the 10 groups, which also reflected the continuity of the population historical distribution of C. chekiangoleosa. The HS population was independent of several other populations, and the genotypes within individuals of this population were relatively distinct due to genetic drift. In addition, HS was located far from other populations, so it can be expected that further genetic drift will be revealed in this population. From the perspective of gene diversity and the Shannon index, it is found that the DMS population showed a relatively high level of diversity. The reason may be that the frequent activities of insects lead to the intensification of gene interaction in the low altitude environment with high average temperature. In addition, the combination of the FA, ZR, and WYL populations may have been related to the fact that they were all located in the Donggong mountain range, and these individuals may have come from the same or similar ancestors. In summary, we infer that C. chekiangoleosa populations can be basically divided into 3 distribution clusters according to the division of mountain ranges: a central cluster, with DMS as the center, which was distributed on the ridge connected with 4 mountain ranges ( Figure 1A-D); an eastern coastal cluster with WYL as the center; a western cluster with HS as the center.

Conservation of Camellia chekiangoleosa Genetic Resources
In summary, this study indicates that the fundamental reason for the endangered status of C. chekiangoleosa is the restriction of gene flow among its populations, resulting in a reduction in genetic diversity and an inability to maintain the ability to adapt to the changing environment [52]. First, similar to other Camellia species, the pollen transmission mode of C. chekiangoleosa depends mainly on insects, while its seeds are mainly transmitted by gravity and animals, which are affected by mountain isolation [48]. Additionally, human activities have exacerbated the fragmentation of the natural habitat of C. chekiangoleosa, which is bound to further reduce the pollen flow between groups. In addition, according to the research of Yang [53], the pollen of C. chekiangoleosa shows a short survival time and a long pollen tube germination time, which is likely to result in abortion if flowers are not pollinated in a timely manner. According to our field investigation, due to the low average temperature in high-altitude mountainous areas, the flowering period of C. chekiangoleosa is long, from November to March of the next year. However, the flowering branches growing on the sunny side of the outer layer of the canopy bloom earlier and often bloom in the inner layer of the canopy, while the flowers in the outer layer are overmature. Therefore, the asynchronous flowering periods within a given stand affect the fruit setting of C. chekiangoleosa, imposing high environmental selection pressure on the existing population. If the protection of the species is not strengthened, its genetic differentiation will further increase, and the selfing rate of the remaining population and the probability of genetic drift will also increase, which will continue to reduce genetic diversity.
According to the results of the genetic diversity assessment and the genetic structure analysis, priority should be given to the local protection of populations with high diversity, such as the DMS, SQS, LS, and GTS populations. Protected areas or stations can be established to strengthen management and eliminate logging and damage. Second, the habitats suitable for the growth and population reproduction of C. chekiangoleosa should be restored with the aim of increasing the heterogeneity and stability of the habitat, which is particularly important for small populations. For populations with low genetic diversity (HS, FA), seeds should be collected within the population (or the most genetically similar population) whenever possible, and artificial seedling cultivation should be carried out according to the biological characteristics of the plants. These plants can then be reintroduced to the original habitat so that the population can slowly recover and grow-on the basis of preserving the original genotype.

Conclusions
In our research, natural populations of C. chekiangoleosa, covering the main distributed range, were used to study genetic diversity and differentiation. C. chekiangoleosa, as an endemic species with narrow distribution, has low genetic diversity, but its central population still has relatively high genetic diversity. In addition, the level of genetic differentiation among populations is high, and the genetic structure analysis also found that Hengshan population, central populations, and eastern populations can be divided into independent groups, indicating that the isolation of mountains may be the main reason for the formation of genetic differentiation. Therefore, considering the genetic diversity difference and significant differentiation of C. chekiangoleosa population, appropriate diversity protection strategies should be adopted according to the genetic diversity level and geographical distribution of different populations.
Supplementary Materials: The following supporting information can be downloaded at: www.mdpi.com/article/10.3390/f13020234/s1, Table S1: Geographical distribution of 12 C. chekiangoleosa geographic populations, Table S2: Repeat motif, primer sequence, fragment size, Tm and data sources information for 16 EST-SSR loci, Table S3: Genetic differentiation in C. chekiangoleosa populations by Nei's, Table S4: Pair-wise FST value and spatial distance matrix of C. chekiangoleosa, Figure  S1: (a) The mean log-likelihood value of the data was based on ten repetitions for each K value, (b) The delta K value was changed with each K value.