Comparative and Phylogenetic Analysis of the Complete Chloroplast Genome of Santalum (Santalaceae)

: Santalum (Santalaceae, sandalwood) is a hemiparasitic genus that includes approximately 15 extant species. It is known for its aromatic heartwood oil, which is used in incense and perfume. Demand for sandalwood-based products has led to drastic over-harvesting, and wild Santalum populations are now threatened. Knowledge of phylogenetic relationships will be critical for the conservation and proper management of this genus. Here, we sequenced the chloroplast genome of 11 Santalum species. The data were then used to investigate chloroplast genome evolutionary dynamics and relationships and divergence time within Santalum and related species. The Santalum chloroplast genome contains typical quadripartite structures, ranging from 143,291 to 144,263 bp. The chloroplast genome contains 110 unique genes. The whole set of ndh genes and the infA gene were found to lose their functions. The P-distance among the Santalum species was 0.0003 to 0.00828. Three mutation hotspot regions, 14 small inversions, and 460 indels events were discovered in the Santalum chloroplast genome. Branch-model-based selection analyses showed that the Santalum species were under widespread purifying selection. Our phylogenomic assessment provides an improved resolution to the phylogenetic relationships of Santalum compared to the past analyses. Our divergence time analysis showed that the crown age of Santalum was 8.46 Mya (million years ago), the ﬁrst divergence occurred around 6.97 Mya, and diversiﬁcation was completed approximately 1 Mya. By sequencing the 11 Santalum species chloroplast genomes, we identiﬁed the variations in the Santalum chloroplast genomes. Using the chloroplast genome sequences, phylogeny and divergence time analyses discovered that the Santalum species were likely to originate due to radiation evolution, and most speciation events occurred less than 1 Mya. Functional Gene Losses Occur with Minimal Size Reduction in the Plastid Genome the Liverwort


Introduction
Sandalwood (Santalum L., Santalaceae) is known for its aromatic heartwood oil and is used in incense and perfume [1]. Santalum comprises 15 extant species and 14 varieties [2]. All sandalwoods are hemiparasitic plants, taking a portion of their water and nutrients from the roots of host plants. Santalum is distributed throughout India, Australia, and the Pacific Islands [3,4]. The Hawaiian Islands (with four species and four varieties) and Australia (with six species) were the distribution centers of the genus [2,5].
The sandalwood species were heavily exploited because of their high value and the demand for the valuable sandalwood oil. Santalum fernandezianum F.Phil., endemic to Juan Fernandez Islands, became extinct during the last century due to human exploitation [6]. Santalum freycinetianum Gaudich. var. lanaiense Rock from the Hawaiian Islands, S. insulare Bertero ex A.DC. var. hendersonense (F.Br.) Fosberg & Sacheét from Henderson Island, S. boninense (Nakai) Tuyama from the Bonin Islands, and S. insulare from the Cook Islands and French Polynesia [7] are now rare or threatened by extinction [8,9].
Santalum has been classified into five sections (Santalum, Solenantha, Hawaiiensia, Polynesica, and Eucarya) according to their morphological characters (Table S1) [3,4]. Section retained photosynthesis, and their chloroplast genomes have only lost some genes, such as ndh [33]. A comparison of chloroplast genome sequences can help in understanding the evolutionary patterns of hemiparasitic plants.
To have a better understanding of the relationship within Santalum and to gain insight into the pattern of Santalum chloroplast genome evolution, we sequenced the chloroplast genome of 11 species of Santalum. Specifically, we attempted to (1) investigate the relationships within Santalum; (2) estimate the divergence time of Santalum; and (3) elucidate chloroplast genome evolution within hemiparasitic plants of Santalum.

Plant Materials and Sequencing the Chloroplast Genomes
We collected 12 individual samples, representing 11 currently described Santalum species. These species covered four sections of Santalum. Details of the 12 samples collected in this study are given in Supplemental Table S2. Specimens of these samples were preserved in the herbarium of the Research Institute of Tropical Forestry, Chinese Academy of Forestry. Xiaojin Liu identified all samples. Leaf tissues were dried using silica gel for subsequent DNA extraction. These materials were from cultivated plants, and permission was not required to collect them. We used the low-coverage whole-genome sequencing method to obtain the whole chloroplast genome. DNA was extracted using the modified CTAB (cetyl trimethylammonium bromide) DNA extraction protocol [34].
The qualified and purified DNA was fragmented by nebulization with compressed nitrogen gas, yielding fragments that were 350 bp in length. Paired-end libraries were prepared by the DNA Library Preparation Kit (Illumina, San Diego, CA, USA) with the following steps. First, the DNA fragments were end-repaired, phosphorylated, and A-tailed. Second, adapters were then ligated with index adapters. Finally, the ligated fragments were amplified for library construction. Sequencing was performed using the Illumina X-ten platform at Novogene. Each sample yielded about 4 Gb (Giga base) of data.

Chloroplast Genome Assembly and Annotation
The raw reads were filtered using Trimmomatic v0.36 [35] to remove the adaptors, low-quality reads and sites. The parameters were set as: LEADING = 20, TRAILING = 20, SLIDINGWINDOW = 4:15, MINLEN = 36, and AVG QUAL = 20. The clean data were used to assemble the chloroplast genome using GetOrganelle [36]. The chloroplast genome reads from the clean data were mapped to the complete chloroplast genome using Geneious Prime v2020.0.5. and the number of chloroplast genome reads were used to calculate the sequencing coverage of chloroplast genome.
Plant mitogenomes are notable for both their extraordinary variations in genomic size and their remarkable variability in structure and organization [37]. According to the results of Kan et al. [38], it is difficult to assemble a whole plant mitochondrial genome based on Illumina data. In this study, we did not analyze the mitochondrial genome.
The newly sequenced chloroplast genomes were annotated using Plann [39] using Santalum album (GenBank Accession number: MK675809) as the reference. The chloroplast genome maps were visualized using OGDRAW [40]. The complete chloroplast genome sequences were deposited in GenBank (MW464914 to MW464925).

Genome Comparison
The mVISTA program was used to analyze the variations in the Santalum chloroplast genomes [41], using the chloroplast genome of S. leptocladum as a reference for sequence annotation (GenBank accession number: MW464918).
The whole Santalum chloroplast genome alignments were performed with MAFFT v7 [42] and adjusted manually. We used the genetic P-distances and the number of SNPs (single nucleotide substitutions) to assess the variance among the Santalum species. The P-distances and the number of SNPs were calculated using MEGA X [43]. To explore the mutation hotspots in the chloroplast genome, nucleotide diversity (π) was calculated using the software DnaSP v6 [44], via sliding window analysis with a window size of 800 bp and a step size of 100 bp. The primers of the mutation hotspots were designed and synthetized to test their working efficiency, following the method of Dong et al. [45].

Natural Selection Test
The ratio (ω) of non-synonymous (dN) to synonymous (dS) nucleotide substitution rate served as an indicator of nature selection in protein-coding genes. The values ω > 1, ω = 1, and ω < 1 indicated positive, neutral, and negative selection, respectively. The rate of dS and dN, andωwere used to evaluate the evolutionary rate of the different genes and gene groups. All coding genes were aligned with MAFFT and deleted the stop codon. DnaSP v6 [44] was used to analyze the dN, dS, and ω. The average values of all pairwise calculated dN and dS were used to define the evolutionary rates of gene or gene groups. Gene with same functions were grouped following previous studies [46][47][48]. We analyzed the following gene sets: (1) concatenating all 67 coding genes; (2) gene sets corresponding to the same functions, for example, atp, pet, psa, psb, rpl, rpo, rps; and (3) the single genes.
In order to compare selection pressures acting on the chloroplast genome of Santalum species, we applied the branch-model method, implemented by CodeML in Easy-CodeML [49], to estimate the ω of Santalum branches. We adopted two-radio models assuming that the branches of Santalum branches (foreground branches) had different ratios from the all other taxa (background branches). Significant difference was evaluated by likelihood ratio tests of maximum likelihood method [50]. The Maximum likelihood (ML) phylogeny tree based on the 70 g 50 s dataset with an hcluster partitioning scheme was used for the selection analyses.

Microstructural Mutation Events
Indels and small inversions were identified based on the aligned chloroplast genome sequence matrix, according to Dong et al. [51]. The indel types were divided into three categories: repeat-related indels, normal indels, and SSR-related indels (simple sequence repeat). Inversions were first identified using the REPtuter program and then checked and confirmed by reexamining the sequence matrix. Inversions form stem-loop structures, including the inversion sequences and inverted repeat.

Phylogenetic Analyses
Phylogenetic analysis was conducted to elucidate the interspecific phylogenetic relationships within Santalum. The complete chloroplast genome of other Santalales species were downloaded from GenBank, and coding genes were extracted using Geneious Prime v2020.0.5.
In order to compare the topology based on different datasets, we generated two datasets for phylogenetic inference. The first dataset (13CPG) was the 12 Santalum complete chloroplast genome sequences with Osyris wightiana Hochst. & Steud. as an outgroup. The second dataset (70 g 50 s) contained 66 coding genes and four rRNA genes, covering 12 Santalum samples and 38 Santalales species from nine other families.
Maximum likelihood (ML) and Bayesian inference (BI) methods were used to infer phylogenetic relationships. For the 13CPG dataset, the best-fit model was found with ModelFinder [52]. For the 70 g 50 s dataset, we used the following partitioning schemes: (1) unpartitioned; (2) partitioned by genes; (3) partitioned with the rcluster algorithm (data pre-partitioned by locus); and (4) partitioned with the hcluster algorithm (data prepartitioned by locus). All partitioning analyses were run in PartitionFinder 2 [53] under the model selection AICc, and with the branch length linked. RAxML-NG [54] was run for the ML tree with 500 bootstrap replicates. RAxML-NG is a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML and it offers improved accuracy and speed.
Mrbayes v3.2 [55] was used to perform the BI tree. For the 70 g 50 s dataset, we used the hcluster partitioning scheme for BI analyses because this scheme resulted in the highest log-likelihood in the ML analyses. The 13CPG dataset used the GTR + G model (the best-fit model from ModelFinder) for BI analyses. The BI analysis was run with two independent chains for 20 million generations with sampling every 1000 generations. The initial 25% of the sampled trees were discarded as burn-in. The stationarity was regarded as having been reached when the average standard deviation of split frequencies remained below 0.01. We used Tracer v1.6 [56] to evaluate convergence and ensure sufficient and effective sample sizes for all parameters surpassing 200.

Molecular Clock Dating
The 70 g 50 s dataset was used to estimate the divergence times of Santalales using five priors. The root age of the tree (crown age of Santalales) was set to 114 Mya (95% HPD: 112-116 Mya) according to the divergence time estimate of the angiosperms [57]. The stem age of Loranthaceae was constrained to 72 Mya (95% HPD: 70.4-73.6 Mya) based on the fossil of Cranwellia [58,59] and the results of Liu et al. [29], and the average age of the most recent common ancestor (MRCA) of Loranthaceae was set to 59 Mya (95% HPD: 57.4-60.6 Mya). The stem age of Viscaceae was 72 Mya (95% HPD: 70.4-73.6 Mya) according to Vidal-Russell and Nickrent [60]. The split age of Santalum and Osyris was set using the external calibration of 32-48 Mya, as estimated by the fossil-calibrated of Harbaugh and Baldwin [2] and Wikström et al. [57].
The divergence time of Santalales was determined in BEAST 2 [61]. BEAST analyses were done using the uncorrelated lognormal relaxed molecular clock model, which is considered the most appropriate model for species-level datasets [61]. The prior tree Yule model was selected, and the Markov Chain Monte Carlo (MCMC) tool was run for 400,000,000 generations with sampling every 10,000 generations. We conducted two separate MCMC runs and used Tracer v1.6 [56] to evaluate convergence and ensure sufficient and effective sample sizes for all parameters surpassing 200. A maximum credibility tree was then built using TreeAnnotator v2.4.7, with the initial 10% of trees discarded as burn-in.

Structural Characteristics of the Santalum Chloroplast Genome
The complete chloroplast genomes of Santalum species were assembled into circular molecules, contained the typical quadripartite structures (Supplemental Figure S1). The Santalum chloroplast genome sequencing coverage was from 205× to 8629× (Supplemental Table S2). The Santalum chloroplast genomes ranged from 143,291 bp (S. acuminatum (R.Br.) A.DC.) to 144,263 bp (S. boninense) in length (Table 1), with LSCs (large single copies) of 82,944 bp (S. acuminatum) to 83,942 bp (S. paniculatum), IRs (inverted repeat) of 24,477 bp (S. paniculatum) to 24,511 bp (S. album L.), and SSCs (small single copy) of 11,237 (S. leptocladum) to 11,379 bp (S. acuminatum). The overall GC content was 38.0%. The Santalum chloroplast genomes encoded a total of 110 unique genes, including 67 protein-coding genes, 30 tRNA genes, 4 rRNA genes, and 9 pseudogenes. The whole set of ndh genes and the infA gene were found to have lost their functions. The ndhA gene had a complete loss of function, and the other ndh genes and infA were pseudogenizations. Sixteen genes had introns, with two (ycf3 and clpP) harboring two introns.

Comparative Analyses of the Chloroplast Genome
The mVISTA results revealed collineation, no rearrangement, and high sequence similarity across the chloroplast genome ( Figure 1). There were 2352 variable sites in the 145,671 bp Santalum chloroplast genome alignment ( Table 2). The overall nucleotide diversity (π) was 0.0036. SSC exhibited the highest π value (0.00926) compared with the IR (0.00087) and LSC (0.00457) regions. The genetic p-distance and number of nucleotide substitutions among these ten Santalum species are given in Supplemental Table S3. The mean genetic distance was 0.00401, the lowest sequence divergence was between S. ellipticum and S. ellipticum var. littorale, and the largest sequence divergence was between S. spicatum and S. yasi.   To identify the mutation hotspots in the chloroplast genome, the nucleotide diversity values are shown in Figure 2. The number of single nucleotide substitutions ranged from 0 to 46, and the π values ranged from 0 to 0.01485 within an 800-bp sliding window size. We defined mutation hotspots with pi values > 0.012. There were three regions (ccsA-trnL, ΨndhE-ΨndhG-rps15, and ycf1), and these three regions were all located within the SSC region. Among these three regions, ccsA-trnL had the highest nucleotide diversity values. Sliding window test of nucleotide diversity (π) in the Santalum chloroplast genomes. The three mutation hotspot regions (π > 0.012) were annotated. The window size was set to 800 bp and the sliding windows size was 100 bp. X-axis, position of the midpoint of a window; Y-axis, π values of each window.

Natural Selection
Using the branch-model method, the Santalum branch had a significantly lower ω ratio (ω = 0.072; p < 0.001) than all other branches (ω = 0.151). This result indicates that Santalum may have been shaped by stronger purifying selection.
The average values of dS, dN and ω are shown in Supplemental Table S6. Among these genes, 11 genes had dS = 0, 23 genes had dN = 0, and 7 genes (petG, petN, psaC, psbI, psbL, psbN, and rpl36) had both dN and dS = 0. The highest dS value gene was rps19 (0.041), followed by rps15 (0.032) and rp32 (0.031). The highest dN value gene was ccsA (0.0093), followed by rps15 (0.0082) and ycf1 (0.0078). Most of genes showed ω < 0.5 except rps3, ycf1, matK, rpl33, and rps7. Within the gene groups, rps exhibited the highest dS value and the pet showed the lowest. The rps also showed the highest dN value and the psa with the lowest. The photosynthetic genes (psa and psb) showed the lowest ω compared to other genes ( Figure 3). Concatenating all 67 coding genes exhibited a higher synonymous substitution than non-synonymous substitutions, and the ω was 0.302. Sliding window test of nucleotide diversity (π) in the Santalum chloroplast genomes. The three mutation hotspot regions (π > 0.012) were annotated. The window size was set to 800 bp and the sliding windows size was 100 bp. X-axis, position of the midpoint of a window; Y-axis, π values of each window.
The most commonly employed loci used in plant phylogeny and DNA barcoding (e.g., rbcL, matK, trnH-psbA) were not selected in our study. We compared the sequence divergence of highly variable regions and the three conventional candidate chloroplast DNA barcodes (matK, rbcL, and trnH-psbA). Sequence variation values, such as genetic distance, nucleotide diversity, and the number of variable sites were given in Supplemental Table S4. The three newly identified markers (ccsA-trnL, ΨndhE-ΨndhG-rps15, and ycf1) had a higher genetic divergence and more information sites than the three conventional candidate chloroplast DNA markers. The primers designed for the three variable markers are given in Supplemental  Table S5 and the primers were work well (Supplemental Figure S2).

Natural Selection
Using the branch-model method, the Santalum branch had a significantly lower ω ratio (ω = 0.072; p < 0.001) than all other branches (ω = 0.151). This result indicates that Santalum may have been shaped by stronger purifying selection.
The average values of dS, dN and ω are shown in Supplemental Table S6. Among these genes, 11 genes had dS = 0, 23 genes had dN = 0, and 7 genes (petG, petN, psaC, psbI, psbL, psbN, and rpl36) had both dN and dS = 0. The highest dS value gene was rps19 (0.041), followed by rps15 (0.032) and rp32 (0.031). The highest dN value gene was ccsA (0.0093), followed by rps15 (0.0082) and ycf1 (0.0078). Most of genes showed ω < 0.5 except rps3, ycf1, matK, rpl33, and rps7. Within the gene groups, rps exhibited the highest dS value and the pet showed the lowest. The rps also showed the highest dN value and the psa with the lowest. The photosynthetic genes (psa and psb) showed the lowest ω compared to other genes (Figure 3). Concatenating all 67 coding genes exhibited a higher synonymous substitution than non-synonymous substitutions, and the ω was 0.302. ratio (ω = 0.072; p < 0.001) than all other branches (ω = 0.151). This result indicates Santalum may have been shaped by stronger purifying selection.
The average values of dS, dN and ω are shown in Supplemental Table S6. Am these genes, 11 genes had dS = 0, 23 genes had dN = 0, and 7 genes (petG, petN, psaC psbL, psbN, and rpl36) had both dN and dS = 0. The highest dS value gene was rps19 (0 followed by rps15 (0.032) and rp32 (0.031). The highest dN value gene was ccsA (0.0 followed by rps15 (0.0082) and ycf1 (0.0078). Most of genes showed ω < 0.5 except ycf1, matK, rpl33, and rps7. Within the gene groups, rps exhibited the highest dS valu the pet showed the lowest. The rps also showed the highest dN value and the psa wit lowest. The photosynthetic genes (psa and psb) showed the lowest ω compared to genes (Figure 3). Concatenating all 67 coding genes exhibited a higher synonymous stitution than non-synonymous substitutions, and the ω was 0.302.

Microstructural Mutation Variable
Among the chloroplast genomes of Santalum species, there were 460 indels in total, including 269 normal indels, 104 repeat-related indels, and 87 SSR-related indels. Most of the indels (77.17%, 355 times) were in the spacer regions, 57 indels were found in the intron regions (12.39%), 26 indels occurred in the pseudogene regions, and 22 indels were found in the exon regions. All SSR-related indels were located in non-coding regions. The length of the normal indels ranged from 1 to 331 bp (Figure 4), and 1-bp indels were the most type (37.92%). The longest normal indel occurred in the ycf4-cemA region, and was a deletion in S. spicatum. Repeat related indels ranged from 2 bp to 28 bp; the longest indel was located in atpH-atpI and was an insert in S. boninense, S. paniculatum, and S. ellipticum var. littorale. Most of the repeat-related indels were 4 to 6 bp long (71.15%). A total of 109 regions had indels: ycf3-trnS had 17 indels, followed by trnL-rpl32 (15 indels), rps16-trnQ (14 indels), atpH-atpI (13 indels), petA-psbJ (12 indels), and matK-rps16 (12 indels). For the coding regions, the ycf1 gene had the most indels (9 indels). most type (37.92%). The longest normal indel occurred in the ycf4-cemA region, and was a deletion in S. spicatum. Repeat related indels ranged from 2 bp to 28 bp; the longest indel was located in atpH-atpI and was an insert in S. boninense, S. paniculatum, and S. ellipticum var. littorale. Most of the repeat-related indels were 4 to 6 bp long (71.15%). A total of 109 regions had indels: ycf3-trnS had 17 indels, followed by trnL-rpl32 (15 indels), rps16-trnQ (14 indels), atpH-atpI (13 indels), petA-psbJ (12 indels), and matK-rps16 (12 indels). For the coding regions, the ycf1 gene had the most indels (9 indels). Fourteen small inversions were identified in the Santalum chloroplast genome. All of the inversions and their inverted repeating flanking sequences formed stem-loop structures. The inversion lengths were from 2 to 33 bp, and the franking repeats were from 7 bp to 25 bp (Supplemental Table S7). There was no correlation between the length of inversion and the flanking repeats sequences. Seven inversions occurred in the LSC regions; four were located in the SSC region, and three in the IR regions. All the inversions were located in non-coding regions. Five inversions (in ndhB, rpl33-rps18, rps15-ycf1, trnH-psbA, and trnM-atpE) were specific to S. acuminatum. The inversion in ndhD-psaC occurred in S. spicatum, while the inversions in trnL-rpl32 and petN-psbM were specific to S. album. S. album had one sample with inversions at ycf2-trnL and psaJ-rpl33. Fourteen small inversions were identified in the Santalum chloroplast genome. All of the inversions and their inverted repeating flanking sequences formed stem-loop structures. The inversion lengths were from 2 to 33 bp, and the franking repeats were from 7 bp to 25 bp (Supplemental Table S7). There was no correlation between the length of inversion and the flanking repeats sequences. Seven inversions occurred in the LSC regions; four were located in the SSC region, and three in the IR regions. All the inversions were located in non-coding regions. Five inversions (in ndhB, rpl33-rps18, rps15-ycf1, trnH-psbA, and trnM-atpE) were specific to S. acuminatum. The inversion in ndhD-psaC occurred in S. spicatum, while the inversions in trnL-rpl32 and petN-psbM were specific to S. album. S. album had one sample with inversions at ycf2-trnL and psaJ-rpl33.

Phylogenetic Inference
The species information from GenBank was shown in Supplemental Table S8. The 13CPG dataset matrix included 150,415 nucleotide sites, of which 6259 were variable sites. The second data matrix, 70 g 50 s, contained 66 protein-coding genes and four rRNA genes from 50 Santalales species. After excluding ambiguous regions and sites, this dataset contained 56,789 nucleotide sites, of which 13,458 (23.70%) were parsimonyinformative sites.
The 70 g 50 s dataset was analyzed under the Akaike Information Corrected Criterion (AICc) using strict hierarchical clustering analysis in PartitionFinder (lnL = −263,607.113888; AICc = 528,592.787194). The optimal partitioning scheme contained 57 partitions (Table 3). The ML tree under the unpartitioned and the three partitioned schemes produced identical topologies ( Figure 5 and Supplemental Figures S3-S5). The ML tree inferred from the 13CPG and 70 g 50 s datasets were similar to the phylogenetic relationships of Santalum species ( Figure 5).

The Estimated Divergence Time
Bayesian relaxed molecular clock analyses suggested that the crown age of the Santalales was 113.91 Mya (Figure 6). The split between the Santalaceae and its closest relatives, Viscaceae and Amphorogynaceae, occurred 81.  According to the 70 g 50 s dataset, we inferred the phylogeny of Santalales. The ML tree showed that all the families were generally resolved and supported as monophyletic clades. According to the results of Chen et al. and Guo et al. [30,33], we selected Erythropalum scandens (Erythropalaceae) as the root of this tree. Ximeniaceae was the earliest diverged lineage. Loranthaceae and Schoepfiaceae formed a clade (BS = 100/PP = 1). Opiliaceae followed by Cervantesiacea were successive sisters to a clade comprising the remaining Santalales. Santalaceae was sister to Viscaceae plus Amphorogynaceae (BS = 100/PP = 1).
All Santalum species formed a monophyletic clade (BS = 100/PP = 1) and were sister to Osyris wightiana within Santalaceae. The phylogenetic relationships among the Santalum species based on the 70 g 50 s dataset and 13CPG dataset had a similar topology. The phylogeny of Santalum had a short branch on the phylogenetic tree, indicating low diver-gence among Santalum species. Santalum spicatum was the first diverged branch. Santalum acuminatum was sister to the remaining species which formed two lineages. The first lineage included three species (S. leptocladum, S. freycinetianum var. pyrularium, and S. sp.) and within the second lineage, two samples of S. album were sister to the remaining species, and the relationships of the three branches were not clear (70 g 50 s: BS = 48/BI = 0.53, 13CPG: BS = 49/BI = 0.72). The S. boninense from section Santalum was sister to S. paniculatum and S. ellipticum from the section Hawaiiensia ( Figure 5). S. acuminatum and S. spicatum were in the formerly recognized the Australian section Eucarya.

The Estimated Divergence Time
Bayesian relaxed molecular clock analyses suggested that the crown age of the Santalales was 113.91 Mya (Figure 6). The split between the Santalaceae and its closest relatives, Viscaceae and Amphorogynaceae, occurred 81.

Santalum Chloroplast Genome Evolution and Variation
Our findings revealed that the Santalum chloroplast genomes have highly similar genome structures, genome sizes, and gene contents (Figures 3 and 4). Our findings are similar to other chloroplast genome studies reporting that single-copy regions and non-cod-

Santalum Chloroplast Genome Evolution and Variation
Our findings revealed that the Santalum chloroplast genomes have highly similar genome structures, genome sizes, and gene contents (Figures 3 and 4). Our findings are similar to other chloroplast genome studies reporting that single-copy regions and noncoding regions are more variable than IRs and coding regions [45,62]. The variation in size relative to other angiosperm species was mainly due to some missing genes ( Figure 1).
All encoded NAD(P)H dehydrogenase complex (Ndh) genes in the Santalum chloroplast genome had functional or physical losses. The ndh genes were the earliest functional losses in the chloroplast genome of hemiparasites [31,63]. All the ndh genes have been lost in the Santalales hemiparasites [30,33], suggesting that the chloroplast NDH pathway is not essential in these lineages, or this function has been transferred to the nuclear genome [31,64]. Another degraded chloroplast gene in the Santalum chloroplast genome was infA; this mutation was detected in most Santalales hemiparasites. The infA gene is a translation initiation factor of the translation initiation complex. Loss and pseudogenization of infA has occurred in many hemiparasitic and holoparasitic plants [31,65]. This gene has also been independently lost multiple times among photoautotrophic plant lineages [66].
The non-synonymous (dN) and synonymous (dS) nucleotide substitution are indicators of the rates of evolution and natural selection. Selection pressure is another factor affecting the rate of sequence evolution. The ω has become a standard measure of selective pressure with ω = 1, >1, <1 signifying neutral evolution, positive selection, and negative or purifying selection [67]. The ω for the branches of Santalales estimated by CODEML under the branch model were less than 1. This indicated that the patterns of parasitic plant chloroplast genome variations were not consistent with strictly neutral molecular evolutions [48,68]. Purifying selection may be the predominant force shaping parasitic plants evolution. The Santalum clade had a lower ω (ω = 0.072) than other Santalales species (ω = 0.151), which means that Santalum may have been shaped by stronger purifying selection than other Santalales species.
The chloroplast genome is inherited as a linked unit, however, more evidence suggesting the strength of selection acting on each coding genes or gene groups were different [46,69]. The lower ω in most genes showed all species underwent a purifying selection (ω < 0.5). Five genes (rps3, rps7, ycf1, matK, and rpl33) in Santalum chloroplast genome have a higher ω value (Supplemental Table S6), of which three were ribosomal protein genes (rpl and rps). dN analyses also showed that these two gene groups have fast non-synonymous substitution rates (Figure 3), which were similar to those of photosynthesis plants [46,70]. matK and ycf1 were two highly divergent genes in the chloroplast genome and are frequently used in plant phylogeny and DNA barcoding (see below). The product of the matK gene is believed to act as a splicing factor for plastid group IIA introns [71] and the ycf1 gene is essential for protein translocons at inner envelope membranes [72]. Photosynthetic apparatus genes (psa, psb, pet) in Santalum chloroplast genome have the lowest dN rates and ω ratios (Figure 3), which were similar to the nonparasitic plants [73]. It was indicated that the photosynthetic apparatus-related genes in hemiparasitic Santalum were conservative.
Microstructural mutation events are ubiquitous in chloroplast genome evolution, but have been little studied. Indels and small inversions were analyzed in this study ( Figure 4 and Table 2). Based on Dong et al. [51], we classified indels mutations into three categories: SSR-related indels, repeat-related indels, and normal indels. The normal indels were the most frequent in the Santalum chloroplast genome, and the size was also variable, ranging from 1 to 331 bp ( Figure 4). Slipped strand mispairing (SSM) has been suggested as the mechanism leading to most SSR-related indels [74,75]. DNA recombination has also been proposed to cause repeat related indels [74,75]. These different mechanisms might be responsible for the observed differences in indel length.
Mutation hotspot regions in the chloroplast genome have been identified in most plant lineages, and studies have identified those markers were more variable than universal chloroplast markers [62,73,76,77]. We identified three variable regions (ccsA-trnL, ΨndhE-ΨndhG-rps15, and ycf1) by comparing the Santalum chloroplast genomes. The ycf1 gene has been identified to be highly variable in several lineages, such as Dalbergia [78], Diospyros [62], and Quercus [79]. Studies have shown that ycf1 is phylogenetically useful [80] and is associated with a high success rate for DNA barcoding [81]. ccsA-trnL and ΨndhE-ΨndhG-rps15 have been less widely used in the phylogeny and DNA barcoding.

Phylogenetic Relationships of Santalum
Based on morphological characters, such as the floral tube color, position of the ovary, the Santalum has been classified into five sections (Santalum, Solenantha, Hawaiiensia, Polynesica, and Eucarya) [3,4]. Section Santalum was described as usually having reddish corollas that were longer than they were wide and partly superior ovaries [3,4], and most species were located in Australia and New Caledonia. The section Santalum included eight species and two varieties. According to the chloroplast genome data ( Figure 5) and the ITS and ETS data [2,14], these species were at multiple phylogenetic positions in the tree and did not form a monophyletic group. All the datasets supported that S. yasi was closer to S. album. The four species on the Hawaiian Islands were separated into two sections (Solenantha and Hawaiiensia). Section Solenantha was described as having longer perianth tubes, smaller ovaries, and the absence of hair proximal to the filaments. Section Hawaiiensia was delimited based on their white, green, brown, or orange corollas and inferior ovaries [3,4]. Based on the such morphological characters, Skottsberg [82] inferred that these two sections were closely related and Fosberg and Sachét [83] treated section Polynesica as a synonym of section Hawaiiensia. However, molecular phylogenetic results demonstrated that sections Hawaiiensia and Solenantha formed monophyletic groups, but were not closely related. Moreover, sections Hawaiiensia and Polynesica were more closely related to other taxa of Santalum [14]. Two species and eight varieties were included in section Polynesica, which was described as appearing to be similar to Hawaiiensia but with ovaries that were partly superior. In the results of molecular phylogenetic analyses, all the varieties of S. insulare formed a monophyletic group, Santalum fernandezianum, which is located on the Juan Fernandez Islands, and is extinct, thus its position in Santalum cannot be determined. The fifth section, Eucarya, is located in Australia, and includes three species [14]. Both chloroplast genome data and nuclear data (ITS and ETS) supported this section as the earliest diverged lineage of the genus. However, these three species did not form a monophyletic group ( Figure 5) [2].
Compared the results of phylogenetic analyses based on the chloroplast genome and the nuclear data, there were some incongruences. The ITS, ETS, and GBSSI (granule-bound starch synthase) sequences (nuclear data) supported the notion that S. acuminatum and S. spicatum formed a monophyletic group [2,5], which conflicted with the chloroplast genome results ( Figure 5). Santalum ellipticum and S. ellipticum var. littorale did not form a monophyletic group and were closed to S. paniculatum ( Figure 5). This was discordant with the ITS and ETS results [14]. There have been ancient hybridization events in Santalum species, which might account for this discordance [13,84,85]. Polyploidy was one of the reasons for the gene tree conflicts [86,87].
There were several auto-and allopolyploid species in Santalum according to an estimate from measuring the C value [5]. Chloroplast genome data showed the maternal line of the allopolyploid species and were used to identify the maternal progenitor. The allopolyploid species of S. ellipticum, S. paniculatum, and S. boninense formed a clade, and there were no diploid species in this clade, although the maternal parents were unresolved. The progenitors might be extinct or not yet discovered. The biogeography suggests that Santalum island colonists tend to be polyploids [5].
Using five fossils, our results showed that the stem time of Santalum was 38.44 Mya, the crown time was 8.46 Mya, most of lineages appeared no later than 2.16−6.97 Mya and radiated less than 1 Mya (Figure 6). These results indicated that the Santalum underwent rapid, recent radiation in about 2 Mya, which was consistent with dispersal and isolation on islands ( Figure 6). Santalum species exhibit island distribution types. Biogeographic reconstructions indicated the Santalum originated in Australia and went through longdistance dispersal out of Australia [2]. Divergence time results indicated the time of speciation events was less than the time of island formation, which means there were multiple long-distance dispersal between Pacific Islands.

Conclusions
The analyzed Santalum chloroplast genomes have a similar structure, gene number, and gene order. Mutation hotspots regions, small inversions, and the co-occurrence of different types of indels or single nucleotide polymorphisms were identified, and the whole set of ndh genes and the infA gene were found to lose their functions. The phylogenetic and divergence analysis based on the complete chloroplast genome discovered that the Santalum species were originated by radiation evolution, and most speciation events occurred less than 1 Mya.

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/10 .3390/f12101303/s1, Figure S1. Chloroplast genome map of Santalum. Genes shown inside circle are transcribed counterclockwise, gene outside are transcribed clockwise. Different functional groups of genes are showed in different colors. Figure S2. Gel profiles of fragments amplified from two samples using four pairs of primers; Figure S3. ML tree with GTR + G model using 70 g 50 s dataset. ML bootstrap support values were presented at each node. ML = 100 were not presented; Figure S4. ML tree with genes partitioning scheme using 70 g 50 s dataset. ML bootstrap support values were presented at each node. ML = 100 were not presented; Figure S5. ML tree with rcluster partitioning scheme using 70 g 50 s dataset. ML bootstrap support values were presented at each node. ML = 100 were not presented. Table S1. The list of species and varieties in Santalum. Table S2. Sampling information for this study; Table S3. Numbers of nucleotide substitutions and sequence genetic distance in 11 Santalum species complete chloroplast genomes. The upper triangle indicates the number of nucleotide substitutions, and the lower triangle indicates the number of sequence distances in complete chloroplast genomes; Table S4. Variability of the three highly mutation hotspot regions and the universal chloroplast DNA barcodes in Santalum; Table S5. Primers of the three highly mutation hotspot regions; Table S6. Average values of dS, dN, and dN/dS ratios among genes and gene groups; Table S7. The chloroplast genomic location and length distributions of 14 inversions of Santalum; Table S8. A list of the 38 taxa sampled from GenBank for phylogenetic analysis.