Assessment of the Genetic Relationship and Population Structure in Oil-Tea Camellia Species Using Simple Sequence Repeat (SSR) Markers

Oil-tea camellia trees, the collective term for a class of economically valuable woody oil crops in China, have attracted extensive attention because of their rich nutritional and pharmaceutical value. This study aimed to analyze the genetic relationship and genetic diversity of oil-tea camellia species using polymorphic SSR markers. One-hundred and forty samples of five species were tested for genetic diversity using twenty-four SSR markers. In this study, a total of 385 alleles were identified using 24 SSR markers, and the average number of alleles per locus was 16.0417. The average Shannon’s information index (I) was 0.1890, and the percentages of polymorphic loci (P) of oil-tea camellia trees were 7.79−79.48%, indicating that oil-tea camellia trees have low diversity. Analysis of molecular variance (AMOVA) showed that the majority of genetic variation (77%) was within populations, and a small fraction (23%) occurred among populations. Principal coordinate analysis (PCoA) results indicated that the first two principal axes explained 7.30% (PC1) and 6.68% (PC2) of the total variance, respectively. Both UPGMA and PCoA divided the 140 accessions into three groups. Camellia oleifera clustered into one class, Camellia vietnamensis and Camellia gauchowensis clustered into one class, and Camellia crapnelliana and Camellia chekiangoleosa clustered into another class. It could be speculated that the genetic relationship of C. vietnamensis and C. gauchowensis is quite close. SSR markers could reflect the genetic relationship among oil-tea camellia germplasm resources, and the results of this study could provide comprehensive information on the conservation, collection, and breeding of oil-tea camellia germplasms.


DNA Extraction
Sample DNA was extracted by the TIANGEN genomic DNA extraction kit (Beijing, China). DNA quality and concentration were then checked by 1% (w/v) agarose gel electrophoresis and the Agilent 2100 Bioanalyzer (USA). Good quality DNA was used directly for SSR analysis or stored at -20 • C for further use.

SSR Analysis
Ninety-six pairs of SSR primers were selected for pre-screening based on the transcriptome data of C. vietnamensis (NCBI accession number: PRJNA825399) [24], in which 15 fluorescently labeled SSR primers were selected for further research ( Table 2). In addition, nine pairs of primers with good polymorphism were screened, referring to Song's study (Table 2) [25]. The 5' end of each forward primer for this analysis was labelled with FAM fluorescent dye (Applied Biosystems, USA). The M13 universal linker sequence (TGTAAAACGACGGCCAGT) was used to add to the 5' direction of the forward primer of each pair of primers, and M13 linker sequences with different fluorescent groups were synthesized. Following the method of Gu [26] with minor modification, the SSR-PCR amplification was performed in a 15 µL total reaction volume, including 1.0 µL (5 pmol·µL −1 ) of forward and reverse primers, 7.5 µL of 2 × Taq PCR master mix (Gene tech, Shanghai, China), 1 µL (50 ng·µL −1 ) of template DNA, and 4.5 µL of ddH 2 O. The PCR program was as follows: 96 • C, 3 min; 96 • C for 30 s, 50-60 • C for 30 s, and 72 • C for 1 min, and these three procedures were cycled 30 times; 72 • C, 10 min. Two microliters of amplified PCR products were used in 2% (w/v) agarose gel electrophoresis to check whether the amplified fragment size and concentration were in the normal ranges at each locus with reference to the DNA marker alignment. Then, 1.0 µL of the fluorescent PCR product was diluted 30 fold with ultrapure water and prepared for machine detection. The diluted PCR products were separated by capillary electrophoresis by the ABI 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA, USA), and data were handled by Gene Marker v.2.2.0 software (Soft Genetics, State College, PA, USA).

Data Acquisition and Analysis
According to the PCR results, a binary matrix was formed in which the presence of the product was marked as 1 and the absence of the product as 0. The results of the 1/0 data matrix were utilized to analyze the genetic diversity of oil-tea camellia trees. Based on the number of alleles, the level of discrimination of each SSR marker was assessed by calculating the percentage of polymorphic loci (P), Nei's genetic diversity (h), Shannon diversity index (I) [27], gene differentiation coefficient (Gst) [26], and gene flow from Gst (Nm). Nei's genetic diversity (h) and Shannon diversity index (I) were calculated using the POPGENE software [28].
According to the DICE coefficient [29], Nei's genetic distance (D) and genetic identity between different groups were further calculated using GenAlex software [30]. The degrees of genetic variation among and within groups were analyzed by the non-hierarchical analysis of molecular variance (AMOVA) method, with 9999 random permutations [28]. Then, the unweighted pair group method with arithmetic (UPGMA) and the principal coordinates analysis (PCoA) were performed [31]. PCoA analysis was performed with GenAlex software. Linkage disequilibrium was analyzed using the pair.ia method of the R package poppr, and plots were drawn in R. In addition, the genetic structure of oil-tea camellia samples was analyzed by STRUCTURE [32], which is a model-based Bayesian clustering program with a range of genetic clusters from K = 3 to 10. Twenty independent runs were evaluated for each fixed K, and the best potential clusters (K value) were checked by the ∆K method on the STRUCTURE Harvester program [32]. The running results were integrated by CLUMPP software [33].

Assessment of SSR Marker Diversity Levels
The 140 accessions from five oil-tea camellia species were analyzed by SSR markers. The alleles detected by 24 pairs of primers at the polymorphic sites ranged from 6 to 31. A total of 385 alleles were generated by amplification, resulting in an average of 16.0417 alleles per locus ( Table 3). The mean of Nei's gene diversity (h) and Shannon's information index (I) were 0.1104 and 0.1890, which indicate that the genetic diversity was not very rich. It can be seen in Table 3 and Table S1 that the range of total genetic variation Ht was 0.0019-0.5000; the average value was 0.1153. The range of genetic variation within population Hs was 0-0.4601; the average value was 0.0698. The gene differentiation coefficient Gst value ranged from 0.0037 to 1.0000, and the average was 0.3948, indicating 39.48% genetic variation among individuals and a high degree of genetic differentiation. The range of gene flow (Nm) values of the whole population was 0-134.0618, and the average value was 0.7666, indicating that there was little gene exchange among the oil-tea camellia group. The alleles at each locus in each sample were coded into a fingerprint in the form of a 0/1 matrix based on bands amplified using 24 pairs of primers. Fingerprinting gives a visual representation of the differences for each sample ( Figure 1). As could be found from the fingerprinting of 140 oil-tea camellia accessions, these 24 pairs of primers could discriminate some of the 140 accessions.

Genetic Diversity of Oil-Tea Camellia Species Based on SSR Analysis
The detailed information of each genetic locus of each species is shown in Table S2. The average sample size was 28 for each species (Table 4). The mean Na was 0.735 (range: 0.200-1.605). The average Ne was 1.138 (range: 1.041-1.197). The mean h was 0.086 (range: 0.027-0.128), and the mean uh was 0.096. The average I s within species reached 0.134. S1 had the highest genetic variability (0.214), and S4 had the lowest value (0.041). When computed at the individual level, the mean I was 0.1890. The results indicate that the genetic differences among different groups were small and the genetic diversity was not very rich.

Genetic Diversity of Oil-Tea Camellia Species Based on SSR Analysis
The detailed information of each genetic locus of each species is shown in Table S2. The average sample size was 28 for each species (Table 4). The mean Na was 0.735 (range: 0.200-1.605). The average Ne was 1.138 (range: 1.041-1.197). The mean h was 0.086 (range: 0.027-0.128), and the mean uh was 0.096. The average Is within species reached 0.134. S1 had the highest genetic variability (0.214), and S4 had the lowest value (0.041). When computed at the individual level, the mean I was 0.1890. The results indicate that the genetic differences among different groups were small and the genetic diversity was not very rich.

Analysis of Nei's Genetic Distance between Species
Nei's genetic distance (D) is a measure of genetic difference among biological populations and can be measured in terms of quality traits and also with quantitative traits. The estimation of genetic distance is important for exploring the origins of cultivars, analyzing the relationships among populations, mapping phylogenetic trees and predicting heterosis, and guiding parental selection. The range of genetic identity among species was 0.8616-0.9719, calculated from 285 amplified fragments. As shown in Figure 2, S1 and S2 had the smallest genetic distance (0.0285) and the largest genetic identity (0.9719) with the closest relatives, followed by S1 and S5. S5 and S4 had the largest genetic distance (0.1490), shared the least genetic identity (0.8616), and were the most distantly related, followed by S4 and S2. The results of AMOVA indicated that most of the genetic variation (77%) occurred within species and only a small fraction (23%) occurred among species (Table 5). In addition, there were significant differences within and among groups. The mean fixation index (F st ) among five groups showed moderate genetic differentiation (F st = 0.231).
had the smallest genetic distance (0.0285) and the largest genetic identity (0.9719) with the closest relatives, followed by S1 and S5. S5 and S4 had the largest genetic distance (0.1490), shared the least genetic identity (0.8616), and were the most distantly related, followed by S4 and S2. The results of AMOVA indicated that most of the genetic variation (77%) occurred within species and only a small fraction (23%) occurred among species (Table 5). In addition, there were significant differences within and among groups. The mean fixation index (Fst) among five groups showed moderate genetic differentiation (Fst = 0.231).

UPGMA and PCoA Analysis
Based on Nei's genetic distances among individuals and groups, the clustering analysis among individuals was accomplished using the aboot method of the R package poppr, by selecting Nei's distance and bootstrapping 1000 times. Cluster analysis among populations was subjected to UPGMA trees drawn using the phylip software. According to the genetic distances, a phylogenetic tree was built ( Figure 3). As can be seen in Figure  3A, most individuals from S1 grouped together; S2, S5, and a small part of S1 were clustered together; individuals of S3 and S4 grouped together. The phylogenetic tree obtained

UPGMA and PCoA Analysis
Based on Nei's genetic distances among individuals and groups, the clustering analysis among individuals was accomplished using the aboot method of the R package poppr, by selecting Nei's distance and bootstrapping 1000 times. Cluster analysis among populations was subjected to UPGMA trees drawn using the phylip software. According to the genetic distances, a phylogenetic tree was built ( Figure 3). As can be seen in Figure 3A, most individuals from S1 grouped together; S2, S5, and a small part of S1 were clustered together; individuals of S3 and S4 grouped together. The phylogenetic tree obtained with Nei's genetic distance classified the species into three main clades ( Figure 3B). The first clades included S1, S2, and S5; the other two were S3 and S4. Among them, C. oleifera was clustered into one subclade, C. vietnamensis and C. crapnelliana were clustered into one subclade, and C. chekiangoleosa and C. gauchowensis were clustered into one subclade. In addition, some C. oleifera and C. vietnamensis were clustered into one subclade. Furthermore, two-dimensional PCoA revealed four distinct clusters on the basis of Nei's genetic distance among individuals (Figure 4). PCoA analysis reflects the variability between two samples or two groups by an intuitive comparison of the straight-line distances between samples in the coordinate axis, which indicates whether the two samples or two groups of samples are notably divergent. PCoA of the first three axes explained 17.61% of the total variation (7.30%, 6.68%, and 3.63%, respectively). The results of PCoA were relatively similar to the individual-based phylogenetic tree. S1 (C. oleifera) samples were clustered together, S2 (C. vietnamensis) and S5 (C. gauchowensis) samples were clustered together, S3 (C. chekiangoleosa) samples were clustered together, and S4 (C. crapnelliana) samples were clustered together. samples in the coordinate axis, which indicates whether the two samples or two groups of samples are notably divergent. PCoA of the first three axes explained 17.61% of the total variation (7.30%, 6.68%, and 3.63%, respectively). The results of PCoA were relatively similar to the individual-based phylogenetic tree. S1 (C. oleifera) samples were clustered together, S2 (C. vietnamensis) and S5 (C. gauchowensis) samples were clustered together, S3 (C. chekiangoleosa) samples were clustered together, and S4 (C. crapnelliana) samples were clustered together.

Linkage Disequilibrium Analysis and Population Structure
In linkage disequilibrium, there is a shift between the probability that a haplotype will appear and the probability that it will be randomly combined. The extent of this offset determines the extent of linkage disequilibrium. The degree of linkage disequilibrium was characterized by the square of the R value, which, when equal to 0, indicates complete linkage equilibrium-independent inheritance. When the R-squared equals 1, it indicates complete linkage disequilibrium. All 24 SSR loci were in linkage disequilibrium with each other; the a maximum R-squared was 1, and a minimum R-squared was 0.0099 ( Figure 5 and Table S3).
The results of STRUCTURE showed a clear maximum for Ln(PD)-derived delta K (∆K) at K = 3 ( Figure 6A,B), and this was considered as a possible number for the population of oil-tea camellia. Therefore, it indicated that the studied accessions belonged to

Linkage Disequilibrium Analysis and Population Structure
In linkage disequilibrium, there is a shift between the probability that a haplotype will appear and the probability that it will be randomly combined. The extent of this offset determines the extent of linkage disequilibrium. The degree of linkage disequilibrium was characterized by the square of the R value, which, when equal to 0, indicates complete linkage equilibrium-independent inheritance. When the R-squared equals 1, it indicates complete linkage disequilibrium. All 24 SSR loci were in linkage disequilibrium with each other; the a maximum R-squared was 1, and a minimum R-squared was 0.0099 ( Figure 5 and Table S3).   The results of STRUCTURE showed a clear maximum for Ln(PD)-derived delta K (∆K) at K = 3 ( Figure 6A,B), and this was considered as a possible number for the population of oil-tea camellia. Therefore, it indicated that the studied accessions belonged to three different clusters ( Figure 6C). Among them, most of C. oleifera samples were clustered in one population; a small proportion of C. oleifera samples were clustered separately; C. vietnamensis, C. gauchowensis, C. crapnelliana, and C. chekiangoleosa were another cluster. The results of population structure analysis were similar to those of UPGMA analysis ( Figure 3A).

Discussion
In this research, the genetic diversity of five oil-tea camellia species was analyzed by using SSR markers. The range of alleles in SSR was 6-31. A total of 385 alleles were found. An average of 16.0417 alleles were found for each SSR-primer pair. At the species level, the range of Ip of oil-tea camellia was 0.041-0.214, and the range of p was 7.79-79.48% (the mean value was 34.49%), showing moderate genetic diversity. When the polymorphism information content (PIC) was less than 0.25, SSR primers showed little polymorphism; when 0.25 < PIC < 0.5, moderate polymorphism; when PIC > 0.5, high polymorphism [34]. The results from the amplification of 345 pairs of SSR primers by Shi et al. [16] indicated that the proportion of polymorphic sites (31.9%) was relatively high. Chai et al. analyzed six natural populations of C. pubipetala, and the results showed that the I value was 0.4100; the PPB (percentage of polymorphic bands) was 80.43% [35]. Although the six populations' distribution was narrow, the genetic diversity was high. A total of 495 alleles were identified by 111 SSR loci in C. japonica, and the range of alleles was 1-12. The mean was 4.46 alleles per locus. The range of PIC was 0.15-0.86, and the average was 0.59 [15]. The mean of p in this study was 34.49%, which is similar to the above results. The ranges of Ne, h, and Ip of 24 markers in this study were 1.041-1.197, 0.027-0.128, and 0.041-0.214, respectively. The differences are larger when compared with the results of Dong et al., who used 16 SSR marker pairs for 54 oil-tea trees (including C. polyodonta, C. oleifera, C. gauchowensis, and C. semiserrata) for genetic diversity analysis. The ranges of Ne, h, and I were 1.17-1.70, 0.14-0.40, and 0.26-0.59, respectively [3]. In conclusion, the SSR primers in this study showed moderate to high levels of polymorphism, which indicates that they were suitable for genetic diversity analysis of oil-tea camellia trees.
Accurate genetic relationships among germplasm accessions are important for variety development, evolutionary studies, and resource conservation [31,36]. Three main clusters were determined by the UPGMA method on 140 samples. C. vietnamensis was clustered with C. gauchowensis, which is similar to the findings of Qi et al. [37] and Chen et al. [1]. They found that various indexes of leaf, flower, fruit, and seed morphologies of C. vietnamensis collected from Hainan Province showed high similarity to those of C. gauchowensis, whose provenance was Gaozhou in Guangdong Province [37], and C. vietnamensis and C. gauchowensis were found to be clustered together by cpDNA sequences and SSR marker analysis [1], so it could be speculated that the relative proximity of C. vietnamensis and C. gauchowensis to each other was quite near. Dai et al. analyzed the chloroplast genome trnH-psbA and matK sequences of 101 different kinds of oil-tea camellia seedlings by DNA barcoding technology [38]. They found that C. vietnamensis was clustered into one branch and C. chekiangoleosa was clustered into another, and the clustering results in this study agree strongly with these results. The findings suggest that C. vietnamensis in Hainan has a relatively close relative. Additionally, the phenomenon of self-incompatibility might occur in close relatives, which might be one of the reasons for the low seed-setting rate of C. vietnamensis in Hainan.
For a more accurate analysis of the genetic structure of oil-tea camellia, STRUCTURE was used for further analysis, and the results indicated that the 140 accessions were classified into three clusters. Among them, most of C. oleifera samples were clustered one population; a small proportion of C. oleifera were in another cluster; and C. vietnamensis, C. gauchowensis, C. crapnelliana, and C. chekiangoleosa were one cluster. There were small fractions of C. oleifera samples that clustered with other Camellia species. It was indicated that plants of the same group came not only from the same region but also from different regions. Possibly, species with different genetic backgrounds may cluster together. This indicates that the kinship of germplasm is extremely complex. The reasons for this phenomenon might be as follows. First, occasional genetic mutations and long-term natural selection have made finding the relatives of oil-tea camellia more complicated. Second, the effect of genetic drift was greater during the natural differentiation of oil-tea camellia than those of natural environmental factors, leading to the failure to divide by geographic region when clustering. Using indirect measures, the gene flow among populations was estimated by the value of Nm [28]. The Nm value (0.7666) indicated low gene flow among species and might promote population differentiation. When Nm < 1, genetic drift is thought to be a major contributor to population differentiation [26,28]. Third, the uneven number of selected samples makes the clustering result not accurate enough, and so on.
The results of genetic structure analysis indicate that the genetic variation of oil-tea camellia samples mainly appeared within species, accounting for 77% of the total variation, leaving only a small portion (23%) occurring among species. That might result from habitat fragmentation and geographical barriers. Some experts have also obtained similar results with other Camellia plants. He et al. used nine pairs of SSR primers to analyze 150 accessions of C. oleifera, and the results indicated that the genetic diversity level of C. oleifera is high [17]. In addition, Li et al. also analyzed 84 accessions of eight natural populations of C. fascicularis with fourteen pairs of primers for SSR markers [18]. The results indicated that the eight populations of C. fascicularis were roughly divided into three clusters, and the genetic variation within populations accounted for 49.95% of variation. To sum up, the results of this study indicate that the genetic variation of oil-tea camellia samples was mainly found within populations, and inbreeding occurred within the population, such as with C. vietnamensis. The degree of gene exchange among species was low.

Conclusions
In this research, 24 pairs of SSR primers were selected to analyze the genetic relationship and population structure of 140 oil-tea camellia accessions using fluorescence detection by capillary electrophoresis. The results indicate that genetic diversity was abundant among the 140 Camellia accessions. Based on genetic distances and clustering by UPGMA, the 140 accessions could be classified into three clusters. Most individuals from S1 grouped together, samples from S2 and S5 grouped together, and samples from S3 and S4 formed the same branch. In addition, some individuals from S1 and S2 were clustered together, which relates to the results of the PCoA. The Bayesian-model-based genetic structure analysis indicated that the studied accessions belonged to three populations. Among them, most of the C. oleifera samples were clustered into one population; a small proportion of C. oleifera were in another cluster; C. vietnamensis, C. gauchowensis, C. crapnelliana, and C. chekiangoleosa were one cluster. Taken together, the findings should be instructive for oil-tea camellia species' introduction, breeding, germplasm preservation, and new-variety development, and provide a theoretical foundation for the classification and identification of oil-tea camellia species in southern China and the research on relatives.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/genes13112162/s1, Table S1: Genetic diversity parameters of each SSR locus; Table S2: Genetic diversity parameters of each SSR locus of each specie; Table S3: Linkage disequilibrium of 24 SSR loci.