Exploring the Phylogeography of Ancient Platycladus orientalis in China by Specific-Locus Amplified Fragment Sequencing

Platycladus orientalis (i.e., Chinese thuja) is famous for its lifespan spanning hundreds, and even thousands, of years. Most ancient P. orientalis populations are widely distributed in China, with accessible historical records, making them valuable genetic resources. In this study, the distribution pattern of ancient P. orientalis in China was analyzed based on 13 bioclimatic factors. Additionally, a specific-locus amplified fragment (SLAF) sequencing method was applied to detect single nucleotide polymorphisms (SNPs) in the genomes of 100 accessions from 13 populations. The resulting data revealed that the suitable areas for the distribution of ancient P. orientalis populations were accurately predicted with four main climatic factors. A total of 81,722 SNPs were identified from 461,867 SLAFs for 100 individuals, with an average sequencing depth of 10.11-fold and a Q30 value of 82.75%. The pair-wise genetic distance and genetic differentiation of 13 populations indicated that the BT-T population exhibited the largest divergence from the other populations. A neighbor-joining phylogenetic tree suggested the relationship between many individuals was inconsistent with the geographical location, possibly indicative of a history of transplantation and cultivation. All individuals were clustered into nine genotypes according to a structural analysis and the relationships between individuals were clarified in phylogenetic trees. This study highlights the importance of the de novo genome sequencing of ancient P. orientalis and may provide the basis for the conservation of P. orientalis genetic resources, the identification of supergene families, and the evaluation of related genetic resources.


Introduction
Platycladus orientalis (L.) Franco (common name: Chinese thuja) is an evergreen tree species of the family Cupressaceae that is endemic to China, with the common name of Chinese thuja. This species is one of the most widely used trees for landscaping in China, and it was planted in some temples, mausoleums, and gardens thousands of years ago. Additionally, P. orientalis is highly resistant to drought and cold conditions, as well as diseases, and is widely distributed throughout China, with the exception of Qinghai and Xinjiang provinces [1,2]. It is famous for its long lifespan of hundreds,

Potential Tree Distribution and Effects of Climatic Factors
The maximum entropy model was used for predicting the areas in which the ancient P. orientalis populations in China are distributed. The ROC curve of the model is presented in Figure 1A. The AUC for the training data subset and the test data subset was close to 1 (0.906 and 0.938, respectively), which indicated the models can accurately distinguish the distribution of the ancient P. orientalis. On the basis of the jackknife test results ( Figure 1B), four climatic factors were identified as the main influencing climatic factors (i.e., Bio1: Annual mean temperature, Bio5: Maximum temperature of the warmest month, Bio8: Mean temperature of the wettest quarter, and Bio10: Mean temperature of the warmest quarter). The simulated distribution of the ancient P. orientalis populations in China is presented in

Statistical Analysis of the Sequencing Data
The distribution of sequence bases is presented in Supplementary Figure S1. The base distribution fluctuated depending on the restriction enzyme site and PCR, especially the first two bases. Digestion efficiency is a key index for evaluating the success of SLAF-seq. Our results suggested that 99.6% normal digestion corresponds to a highly efficient digestion. Additionally, more than 42.13 million reads for 100 individuals were generated in this study. As mentioned in the

Statistical Analysis of the Sequencing Data
The distribution of sequence bases is presented in Supplementary Figure S1. The base distribution fluctuated depending on the restriction enzyme site and PCR, especially the first two bases. Digestion efficiency is a key index for evaluating the success of SLAF-seq. Our results suggested that 99.6% normal digestion corresponds to a highly efficient digestion. Additionally, more than 42.13 million reads for 100 individuals were generated in this study. As mentioned in the

Statistical Analysis of the Sequencing Data
The distribution of sequence bases is presented in Supplementary Figure S1. The base distribution fluctuated depending on the restriction enzyme site and PCR, especially the first two bases. Digestion efficiency is a key index for evaluating the success of SLAF-seq. Our results suggested that 99.6% normal digestion corresponds to a highly efficient digestion. Additionally, more than 42.13 million reads for 100 individuals were generated in this study. As mentioned in the Materials and Methods section, a Q value of 30 indicated the error rate was 1/1000. The Q30 value (Q < 30) and GC content for the reads were 82.75% and 34.43%, respectively.

SLAF Tags and SNP Sites
In this study, a total of 42,299 high-quality SLAF tags were identified after a sequence alignment with the reference genome. The high-quality SLAFs were selected with a total depth threshold of 1000-fold, and the average sequencing depth was 10.11-fold. Of these SLAFs, 32,870 were identified as polymorphic, and 9249 were identified as nonpolymorphic (Table 1, Supplementary File S1). Finally, 81,722 SNPs with a minor allele frequency ≥0.05 and integrity ≥80% were derived from 461,867 SLAFs for the 100 accessions, with a heterozygosity ratio of 0.0198.

Genome-Wide Estimates of Genetic Diversity
The pair-wise genetic distance (Ds) and genetic differentiation (Fst) of 13 ancient P. orientalis populations were calculated ( Table 2). The Ds calculated with Nei's method ranged from 0.229 to 0.365, with an average of 0.288. The largest genetic distance (0.365) was associated with the Zhongshan Park (BJ-Z) and Weiwu Temple (GS-W) populations, followed by the BJ-Z and Jinci Temple (SX-J) populations. The smallest genetic distance (0.229) was calculated for the Cemetery of Confucius (SD-C) and Mengmu Cemetery (SD-M) populations. The Fst value ranged from 0.024 to 0.386, with an average of 0.147. The highest Fst value (0.386) was recorded for the BJ-Z and SX-J populations, followed by the Songyang Academy (HN-S) and SX-J populations. The lowest Fst value (0.024) was calculated for the Tiantan Park (BJ-T) and Wofuo Temple (BJ-W) populations. Moreover, the Ds and Fst values between BJ-Z and the other populations were highest and were greater than the average values. Additionally, the Pi and Tajima's D value of all populations were calculated and list in Supplementary Files S2 and S3. Table 2. Estimates of evolutionary divergence (Ds) (down) and genetic differentiation (Fst) (up) over sequence pairs and between groups. To ascertain the divergence and relationships among the populations, a phylogenetic tree consisting of 100 accessions was constructed based on the identified SNPs (Supplementary Figure S2). The BJ-Z individuals were grouped together and were separated from the other populations, whereas the Xuanyuan Temple (SX-X), SX-J, and SD-M individuals were also grouped together, but they were also clustered with other populations. The relationship between many individuals was inconsistent with the geographical location. For example, Ditan Park (BJ-D) individuals were grouped with individuals from various populations. This result may have been because the younger trees were the progeny of other older trees and some younger trees may have been transplanted from other areas. Therefore, phylogenetic trees for individuals over 500 and 1000 years old were constructed to resolve the relationships among older individuals. Accordingly, 70 and 39 individuals over 500 and 1000 years old, respectively, were identified, and the corresponding phylogenetic trees ( Figure 3A)    On the basis of the SNP data, the Admixture software was used to analyze the genetic structure of the ancient P. orientalis populations. A total of 100 accessions were divided according to the cross-validation error rate (Supplementary Figure S3A). Additionally, K was initially set as 1-30 prior to clustering (Supplementary Figure S3B). The peak value of ∆K was used to determine the number of groups. According to the results, when K = 9, nine separate groups were identified, with each group including 3-32 members. The length of each colored segment in Figure 4 represents the proportion of an individual's genome that was derived from nine ancestral populations. Groups 8 and 9 consisted of the individuals from the GS-W and SX-J locations, respectively, whereas Groups 5, 6, and 7 comprised the individuals from BJ-Z. Group

Discussion
In this study, the 100 ancient P. orientalis accessions collected from 13 sites represented the prevailing ancient P. orientalis populations in China. All trees were well protected and had available historical records, thereby ensuring the accuracy of the analysis. The distribution of these samples was used to predict the potential distribution in other suitable areas. A model accuracy test confirmed that the simulation results were of a high standard, implying the developed model was appropriate for the objectives of this study. On the basis of the generated data, we predicted the regions in China potentially suitable for ancient P. orientalis populations. In this prediction, Bio1 (Annual mean temperature), Bio5 (Maximum temperature of the warmest month), Bio8 (Mean temperature of the wettest quarter), and Bio10 (Mean temperature of the warmest quarter) were considered the climate

Discussion
In this study, the 100 ancient P. orientalis accessions collected from 13 sites represented the prevailing ancient P. orientalis populations in China. All trees were well protected and had available historical records, thereby ensuring the accuracy of the analysis. The distribution of these samples was used to predict the potential distribution in other suitable areas. A model accuracy test confirmed that the simulation results were of a high standard, implying the developed model was appropriate for the objectives of this study. On the basis of the generated data, we predicted the regions in China potentially suitable for ancient P. orientalis populations. In this prediction, Bio1 (Annual mean temperature), Bio5 (Maximum temperature of the warmest month), Bio8 (Mean temperature of the wettest quarter), and Bio10 (Mean temperature of the warmest quarter) were considered the climate factors with the strongest influences. In fact, the main disease causing the death of P. orientalis trees is leaf blight, which is caused by biotic factors. This disease occurs at elevated temperatures, especially during the warmest quarter (6-9), which is also the wettest quarter. In this quarter, the temperature and humidity can promote the propagation and spread of pathogenic organisms, with Semanotus bifasciatus Motschulsky and Pityogenes spp. L. infecting trees more easily than normal. These infections often accelerate the death of P. orientalis populations. Consequently, temperature is an important factor affecting the distribution and reproduction of P. orientalis, especially under humid conditions. This has been confirmed by our predictions, which may be relevant for identifying the factors influencing the predicted appropriate distribution. Thus, our findings provide the basis for future investigations regarding the protection of ancient P. orientalis.
Advances in high-throughput next-generation sequencing have allowed researchers to generate genomic marker datasets for particular species. These datasets can be applied to address a broad range of questions related to population genetics and evolution more precisely than with traditional methods. In this study, 100 accessions were analyzed with a SLAF-seq method, with an average sequencing depth of 10.11-fold that confirmed the accuracy of the analysis [15,20]. The Q30 value of 82.75% verified the reliability of our results. A total of 81,722 SNPs were identified from the 461,867 SLAFs for 100 individuals, which is consistent with the results of a SLAF-seq investigation of soybean [15]. In a previous study of a P. orientalis population involving a more traditional method, 1,680 polymorphic bands were identified based on AFLPs [21]. Additionally, 34 polymorphic sites were identified in a Thuja sutchuenensis Franch. population based on a RAPD method [22]. The number of SNPs identified in the current SLAF-seq study, with a high genome coverage, far exceeded the number of polymorphisms detected using traditional methods.
According to our phylogenetic analysis, all of the individuals from the BJ-Z population clustered together without other individuals (Supplementary Figure S2), suggesting the origin of the BJ-Z population differs from that of the other populations; the pair-wise Ds and Fst values support this suggestion. The genetic structural analysis revealed that the BJ-Z population comprises three genotypes, which is in accordance with the phylogenetic results. Some populations (e.g., HN-S and BJ-T) consisted of two or more genotypes, indicative of a relatively high gene flow.
To accurately clarify the selection between populations, genome-wide variations were measured based on the SNPs between the BJ-Z and SD-D populations. The θπ ratios and Fst values were calculated with a sliding window analysis (100 kb windows and 10 kb steps). The threshold values for selecting genomic regions was set as the 5% left and right tails of the empirical θπ ratio distribution (the θπ ratios were 0.24 and 3.03). Regions above the horizontal dashed line (the 5% right tail of the empirical Fst distribution, where Fst is 0.361) were identified as appropriate regions for BJ-Z (blue points) and GS-W (green points) populations (Supplementary Figure S4; Supplementary File S4).
This study highlights the value of the de novo sequencing of ancient P. orientalis genomes to elucidate the genomic patterns in a phylogenetic framework. The SLAF-seq analysis of various ancient P. orientalis populations revealed the genetic diversity and the geographically distinct demographic histories of the trees, which were consistent with the results of phylogenomic analyses. The data presented herein may represent the basis for future studies related to the conservation of P. orientalis genetic resources, the identification of supergene families, and the evaluation of related genetic resources.

Sample Preparation
A total of 100 ancient P. orientalis accessions were included in the present study to represent the genetic and geographical diversity in China. The sample collection sites ranged from 34 • N to 40 • N and from 102 • E to 117 • E. All samples were collected from historical cultural monuments in China, including temples, mausoleums, and royal gardens that have existed for thousands of years (Table 3). For the purpose of this study, P. orientalis trees that were over 100 years old were considered as ancient trees. Tree age was based on the diameter at breast height (DBH) of selected trees as well as historical records available from local government departments responsible for managing ancient trees. All individuals included in this study were over 100 years old, including 30 individuals 100 to 500 and 500 to 1000 years old as well as 40 individuals 1000 to 3000 years old. The DBH of these trees exceeded 80 cm, and most of the trees were planted for greening and celebrations. The DBH of the oldest P. orientalis tree (approximately 3000 years old) was 7800 cm. New leaves were collected from the south-facing middle canopy layer of pest-and disease-free trees exposed to similar daily illumination times. Samples were flash frozen in liquid nitrogen.

Potential Distribution Predictions
A maximum entropy model was used to predict the potential distribution of ancient P. orientalis. This investigation involved a total of 13 sites. Environmental data were derived from the WorldClim database (http://www.worldclim.org/). Additionally, 19 bioclimatic variables were considered in this research (Table 4), with a layer resolution of 30". An administrative map of China (1:400 million) was obtained from the National Geomatics Center of China. The maximum entropy model was constructed with the Maxent (version 3.4.1) program, and 75% of the samples were used to establish a training subset, whereas the remaining samples were used to verify the reliability of the model. Isothermality (BIO2/BIO7) (* 100) Bio4 Temperature Seasonality (standard deviation *100) Bio5 Max Temperature of Warmest Month Bio6 Min Temperature of Coldest Month Bio7 Temperature Annual Range (BIO5-BIO6) Bio8 Mean Temperature of Wettest Quarter Bio9 Mean Temperature of Driest Quarter Bio10 Mean Temperature of Warmest Quarter Bio11 Mean Temperature of Coldest Quarter Bio12 Annual Precipitation Bio13 Precipitation of Wettest Month Bio14 Precipitation of Driest Month Bio15 Precipitation Seasonality (Coefficient of Variation) Bio16 Precipitation of Wettest Quarter Bio17 Precipitation of Driest Quarter Bio18 Precipitation of Warmest Quarter Bio19 Precipitation of Coldest Quarter The receiver operating characteristic (ROC) curve was drawn after the maximum entropy model was constructed, and the area under the curve (AUC) was used to test the accuracy of the model. The evaluation criteria for the AUC were as follows [23,24]: Failed (0.5-0.6), poor (0.6-0.7), common (0.7-0.8), good (0.8-0.9), and great (0.9-1.0). Additionally, the jackknife method was used to identify the climatic factors with the biggest influence [25]. Finally, the ancient P. orientalis distribution areas were divided into the following four grades according to the natural discontinuity method: Not suitable, low suitability, moderate suitability, and highly suitable.

SLAF Library Construction and Sequencing
For each accession, the CTAB method was used to isolate DNA from the dried leaves of a single plant for a subsequent SLAF-seq analysis [19]. Because the P. orientalis genome has not been sequenced, the Picea glauca genome was selected as the reference sequence for predicting restriction enzyme sites (genome size: 24.63 Gb; GC content: 38.51%) [26] based on a previous study [27]. The restriction enzymes and sizes of the digested fragments were evaluated according to the following criteria: 1) SLAF tags per genome >10,000; 2) even distribution in unique genomic regions; and 3) repeated SLAFs avoided as much as possible. Digested genomic fragments with differing lengths were simulated in silico, after which restriction enzymes were selected based on the uniqueness and uniformity of the alignment of the simulated fragments to the reference genome. Accordingly, two restriction enzymes (EcoRV and ScaI) were selected, and a 464 to 494 bp fragment was identified as a SLAF tag. A total of 55,421 SLAF tags were predicted. For each sample, 10 µg genomic DNA (≥100 ng µl −1 ) was digested and incubated at 37 • C with EcoRV [New England Biolabs (NEB), Ipswich, MA, USA], T4 DNA ligase (NEB), ATP (NEB), and an ScaI adapter. The restriction-ligation reactions were heat-inactivated at 65 • C, and the samples were digested with ScaI at 37 • C. The restriction-ligation samples (diluted) were used as the template for a PCR amplification with Taq DNA polymerase (NEB), dNTPs, and an EcoRV-primer containing a barcode. The E.Z.N.A. ® Cycle Pure Kit (Omega Bio-Tek, Norcross, GA, USA) was used to purify the PCR products. Pooled samples were incubated at 37 • C with EcoRV, T4 DNA ligase, ATP, and a Solexa ™ adapter (Illumina Co., San Diego, CA, USA), after which they were purified with a Quick Spin column (Qiagen, Hilden, Germany) and separated on a 2% (w/v) agarose gel. Fragments that were 400-500 bp were isolated with the Gel Extraction Kit (Qiagen). These fragments were then used as the template for a PCR amplification with the Phusion ™ Master Mix (NEB) and the Solexa Amplification primer mix to add the barcode. The amplified fragments that were 450-500 bp long were excised and purified as previously described and diluted for the paired-end sequencing, which was completed with the Genome Analyzer II high-throughput sequencing platform (Illumina, Inc., San Diego, CA, USA).

Sequencing Data Processing and SNP Calling
The sequence error rate was estimated using rice (Oryza sativa indica) data as a control to prevent false positive results [28]. The Q value was used to evaluate the single-base error rate of high-throughput sequencing [Q = −10 × log10e (e represents the single-base error rate)]. The base distribution of SLAF-seq results was also examined according to the restriction sites and PCR. The reference sequence was selected based on the sequencing depth of each SLAF tag, and all reads were aligned to the reference with the Burrows Wheeler Aligner [29]. The Genome Analysis Toolkit [30] and SAMtools [31] were used for SNP calling, and the resulting data were combined to form the SNP dataset. The SNPs with a minor allele frequency <5% and integrity <80% were excluded from the dataset.

Phylogenetic Analysis
Phylogenetic relationships were deduced from the SLAF-seq data. The SNPs were used to construct phylogenetic trees for all accessions according to the neighbor-joining method [32] in the MEGA5 program [33]. The genetic structures were analyzed with the Admixture software [34]. A principal component analysis was performed based on the SNP differences in individual genomes to divide the accessions into subpopulations with R (3.4.4). To analyze the organization of the populations based on multilocus genotype information, the genetic population structure was examined with the Admixture software [34]. To clarify the evolutionary process, K (genetic clusters) was predefined as 1-30 before clustering. The cross-validation error rate (∆K) was used to determine K (minimum value). We also imported the same set of SNPs into the GenoDive program [35] to perform various population differentiation tests.

Conclusions
In summary, we herein reveal the genetic diversity of ancient P. orientalis at the population and species level. Considerable genetic differentiation was observed among populations. A phylogenetic tree suggested the relationship between many individuals was inconsistent with the geographical location, possibly indicative of a history of transplantation and cultivation. All individuals were clustered into nine genotypes according to a structural analysis. On the basis of these findings, strategies have been proposed for conserving ancient P. orientalis populations.