Comparative Analyses of Chloroplast Genomes for Parasitic Species of Santalales in the Light of Two Newly Sequenced Species, Taxillus nigrans and Scurrula parasitica

When a flowering plant species changes its life history from self-supply to parasite, its chloroplast genomes may have experienced functional physical reduction, and gene loss. Most species of Santalales are hemiparasitic and few studies focus on comparing the chloroplast genomes of the species from this order. In this study, we collected and compared chloroplast genomes of 12 species of Santalales and sequenced the chloroplast genomes of Taxillus nigrans and Scurrula parasitica for the first time. The chloroplast genomes for these species showed typical quadripartite structural organization. Phylogenetic analysis suggested that these 12 species of Santalales clustered into three clades: Viscum (4 spp.) and Osyris (1 sp.) in the Santalaceae and Champereia (1 sp.) in the Opiliaceae formed one clade, while Taxillus (3 spp.) and Scurrula (1 sp.) in the Loranthaceae and Schoepfia (1 sp.) in the Schoepfiaceae formed another clade. Erythropalum (1 sp.), in the Erythropalaceae, appeared as a third, most distant, clade within the Santalales. In addition, both Viscum and Taxillus are monophyletic, and Scurrula is sister to Taxillus. A comparative analysis of the chloroplast genome showed differences in genome size and the loss of genes, such as the ndh genes, infA genes, partial ribosomal genes, and tRNA genes. The 12 species were classified into six categories by the loss, order, and structure of genes in the chloroplast genome. Each of the five genera (Viscum, Osyris, Champereia, Schoepfia, and Erythropalum) represented an independent category, while the three Taxillus species and Scurrula were classified into a sixth category. Although we found that different genes were lost in various categories, most genes related to photosynthesis were retained in the 12 species. Hence, the genetic information accorded with observations that they are hemiparasitic species. Our comparative genomic analyses can provide a new case for the chloroplast genome evolution of parasitic species.


Introduction
Parasitic plants obtain all or most nutrients and water from their host plants and are often divided into hemiparasitic and holoparasitic species [1,2]. Hemiparasitic species can obtain nutrients and water from their host species and can produce organic matter by conducting photosynthesis. Holoparasitic species, however, cannot conduct photosynthesis as their leaves are usually degraded into squamous and there is insufficient chlorophyll [3]. On the tree of life, most parasitic plant species are nestled in different clades of autotrophic plant species, suggesting that they most likely experienced a transformation in life history from autotroph to parasite [4]. During this process, many morphological changes may have occurred, including the reduction of roots into haustoria [5], the decrease of photosynthetic tissues, etc. At the same time, at the genetic level, the chloroplast genomes of parasitic plant species experience a functional and physical reduction, gene loss, etc. [6][7][8].
Since parasitic plants use their haustoria to penetrate their host plants' tissues and uptake nutrients, their ability to produce carbohydrates via photosynthesis may have degraded,

DNA Sequencing and Chloroplast Genome Assembly
The experimental samples we collected were the fresh leaves of T. nigrans and S. parasitica, that parasitize on Platanus acerifolia and Ligustrum lucidum, respectively. The samples were collected at Sichuan University. We immediately froze those leaves in a liquid nitrogen tank and sent the frozen leaves to the Novogene Company for DNA extraction and genome sequencing. The modified CTAB method [27] was used to extract the total DNA of the leaf samples and Nanodrop was applied to detect DNA purity (OD 260/280 ratio). Sequencing was performed using the Illumina High-throughput Sequencing Platform (HiSeq/MiSeq). Finally, we obtained raw reads of 6.73 G and 6.85 G from T. nigrans and S. parasitica samples, respectively, and their GC contents were 42.3% and 41.6%, respectively.
We used the Trimmomatic v0.36 [28] to filter the raw reads and obtain high-quality clean reads. The BWA-MEM V0.7.12 [29] was used to compare the clean reads using T. chinensis as the reference chloroplast genome (Genbank: NC_036306.1). The read sequence was mapped to the corresponding reference genome. We used NOVOPlasty v2.6.3 [30] and Velvet v1.2.07 [31] to assemble and splice chloroplast genomes. We spliced contigs into scaffold sequences and then used them to assemble the chloroplast genome.

Gene Annotation and Sequence Analyses
We used Geneious v11.0.3 [32] to check and then save the assembly results as fasta files. We used Plann V1.1 [33] to annotate the T. nigrans and S. parasitica chloroplast by referring to the effects of T. chinensis chloroplast genome annotation. We then used Geneious v8.1.4 [32] and Sequin v15.10 to correct the annotation results. The final chloroplast genomes sequence of T. nigrans and S. parasitica was submitted to GenBank. The chloroplast gene map was obtained using the online program Organellar Genome DRAW (OGDRAW) v1.2 (http://ogdraw.mpimp-golm.mpg.de/ (accessed on 12 April 2018). To analyze the characteristics of the variations in synonymous codon usage, we used MEGA6 [34] to obtain the relative synonymous codon usage values (RSCU) and codon usage by neglecting the influence of amino acid composition.

Phylogenetic Analysis
Phylogenetic trees were constructed based on the chloroplast genomes of 12 species of Santalales to analyze their phylogenetic relationships. We used MAFFT V7.158 [35] and MEGA v6.0 [34] to extract and align the amino acid sequences of the proteins encoded by their common genes. An ML tree was constructed by the RAxML V8.2.11 software [36] based on the PROTGAMMAJTT model. The outgroups consist of two holophytes (Pvrola. rotundifolia (KU833271.1) and Vaccinium. macrocarpon (NC019616.1). Then we used FigTree v1.4.3 to check the results.

Characteristics of T. nigrans and S. parasitica Chloroplast Genomes
The chloroplast genomes of T. nigrans and S. parasitica are circular molecules that retained the typical structure ( Figure 1). The lengths of the chloroplast genomes of those two species were 121,419 bp and 121,750 bp, respectively. Both chloroplast genomes comprised IR regions (T. nigrans, 22,569 bp; S. parasitica, 22,687 bp) that were separated by the LSC region (T. nigrans, 70,181 bp; S. parasitica, 70,270 bp) and the SSC region (T. nigrans, 6100 bp; S. parasitica, 6106 bp) ( Figure 1). The GC contents of the T. nigrans and S. parasitica chloroplast DNA were 37.4% and 37.2%, respectively. These were unevenly distributed throughout their chloroplast genomes (Table 1). A total of 106 genes were annotated in both the T. nigrans and S. parasitica, including four pseudogenes (Table 2), eight rRNA genes, 28 tRNA genes, and 66 protein-coding genes.
leucine and 524 (1.3%) were encoded for tryptophan. In both species, leucine was prevalent and tryptophan was the least prevalent of these amino acids. The compl roplast genome sequence of T. nigrans and S. parasitica has been deposited in G under accession numbers MH095982 and MH101514, respectively.   Other proteins accD, ccsA, cemA, clpP, matK 10 Proteins of unknown function ycf2 11 Ribosomal RNAs rrn4.5S, rrn5S, rrn16S, rrn23S In T. nigrans, 3993 codons (9.9%) were encoded for leucine, while 581 (1.4%) were encoded for tryptophan. Similarly, in S. parasitica, 4027 codons (9.9%) were encoded for leucine and 524 (1.3%) were encoded for tryptophan. In both species, leucine was the most prevalent and tryptophan was the least prevalent of these amino acids. The complete chloroplast genome sequence of T. nigrans and S. parasitica has been deposited in GenBank under accession numbers MH095982 and MH101514, respectively.

Comparative Chloroplast Genomic Analysis
A comparative analysis of the chloroplast genomes of the 12 species of Santalales (Table 3) demonstrated that the lengths of the genomes varied from 118 kb to 156 kb. The length of the S. jasminodora chloroplast genome was the shortest and the length of the E. scandens chloroplast genome was the longest. E. scandens had the largest LSC (84,799 bp) and SSC (18,567 bp) of the 12 species. However, E. scandens also had the smallest proportion of LSC and the largest proportion of SSC of the 12 species. The length of IR varied from 22 k to 28 kbp. S. jasminodora had the smallest IR, much shorter than the IRs of the other 11 species, but its LSC had the largest proportion of chloroplast genomes. C. manillana had the largest IR (28,075 bp) and the largest proportion of chloroplast genomes. By comparing the lengths of LSC, SSC, and IR (Table 3), the chloroplast genomes from the 12 species can be classified into six categories: Viscum, Osyris, Champereia, Schoepfia, and Erythropalum. Each represented an independent category, while the three Taxillus species and Scurrula represented a sixth category. The three Taxillus species and Scurrula had a minimal chloroplast genome size and composition differences and the smallest proportion of the SSC. Meanwhile, the lengths of the chloroplast genome, LSC, SSC, and IR of the four Viscum species were similar in gene numbers, position, and mVista analysis. Yet, their total number of genes, number of protein-coding genes, and tRNA genes were different. When comparing the above two categories, the three Taxillus species, one Scurrula species, and the four Viscum species all showed slight differences. However, among all 12 species, the remaining four categories (Champereia, Erythropalum, Osyris, and Schoepfia) all showed apparent differences, especially concerning the lengths of the chloroplast genome, SSC, and IR (Table 3).
Overall, among the chloroplast genomes of the 12 species, the coding areas were more conservative than the non-coding areas. The IR had a lower divergence than the LSC and SSC. Eight rRNA genes did not have relatively large indels and were highly conserved. TRNA genes and protein-coding genes, such as rpoC2 and ycf2, had large indels. The difference between the chloroplast genomes of T. nigrans and T. sutchuenensis was the smallest. The chloroplast genomes of the three species of Taxillus and S. parasitica were very similar to each other (Figures 2 and 3).
In the chloroplast genomes of the 12 species, eight rRNA genes, several rps genes (rps2, 3, 4, 7, 8, 11, 12, 14, 18, and 19), rpl genes (rpl2, 14, 20, 22, 23, and 36), and trn (E and fM) genes were all present and were relatively conservative. The DNA sequences from gene trnT-GGU to gene trnQ-UUG in V. minimum were found to be in a reverse direction compared to the other 11 species. The trnR-ACG and trnN-GUU genes were absent in O. alba; at the same time, the DNA sequences of the ccsA plus trnL-UAG genes were found to be in a reverse direction when compared to the other 11 species.
By comparing and analyzing the chloroplast genomes of the 12 species of Santalales, we found that, in the three Taxillus species and the one Scurrula species, some genes were missing, including several NAD(P)H dehydrogenase complex subunits (ndh gene), four ribosomal protein genes (rpl32, rps15, rps16, and rps33), one ycf gene (ycf1), and the initiation factor gene (infA). Ten tRNA genes (trnL-UAA, trnI-GAU, trnK-UUU, trnP-GGG, trnP-TGG, trnH-GUG, trnQ-UUG, trnG-UCC, trnV-UAC, and trnA-UGC) were also missing in the chloroplast genomes of these four species. Two ribosomal protein genes (rpl16 and rpl2) and the repeat gene ycf15 degenerated into pseudogenes in the same four species as their gene coding regions were interrupted by deletions, insertions, or internal stop codons.  By comparing and analyzing the chloroplast genomes of the 12 species of Santalales, we found that, in the three Taxillus species and the one Scurrula species, some genes were missing, including several NAD(P)H dehydrogenase complex subunits (ndh gene), four  (Figure 4). The trnH-GUG gene was lost from the chloroplast genomes in the other four species.

Phylogenetic Analysis
The chloroplast genome can provide essential data for evolution, taxonomy, and phylogenetic studies. 11 out of the 12 nodes in the maximum likelihood tree received 94% to 100% bootstrap support ( Figure 5). Phylogenetic analysis revealed that 11 out of the 12 species of Santalales clustered into two highly supported clades. One clade includes Viscum (4 spp.), Osyris (Santalaceae), and Champereia (Opiliaceae), while the other clade includes Taxillus (3 spp.), Scurrula (Loranthaceae) and Schoepfia (Schoepfiaceae). In addition, a third clade is the most distant clade among the Santalales, and only includes Erythropalum (Erythropalaceae). While the monophyly of Viscum was strongly sustained, the monophyly of Santalaceae received moderate bootstrap support (75%), as indicated by the apparent sister relationship between Viscum (Visceae) and Osyris (Santaleae). In this way, the monophyly of Santalaceae deserves closer research.

Phylogenetic Analysis
The chloroplast genome can provide essential data for evolution, taxonomy, and phylogenetic studies. 11 out of the 12 nodes in the maximum likelihood tree received 94% to 100% bootstrap support ( Figure 5). Phylogenetic analysis revealed that 11 out of the 12 species of Santalales clustered into two highly supported clades. One clade includes Viscum (4 spp.), Osyris (Santalaceae), and Champereia (Opiliaceae), while the other clade includes Taxillus (3 spp.), Scurrula (Loranthaceae) and Schoepfia (Schoepfiaceae). In addition, a third clade is the most distant clade among the Santalales, and only includes Erythropalum (Erythropalaceae). While the monophyly of Viscum was strongly sustained, the monophyly of Santalaceae received moderate bootstrap support (75%), as indicated by the apparent sister relationship between Viscum (Visceae) and Osyris (Santaleae). In this way, the monophyly of Santalaceae deserves closer research.

Discussion
At the third codon position, it prefers A and T. Also, the preference for A and T appears at the stop codons. We found that the usage of A-ending and U-ending is generally excessive. Other than trnL-CAA and trnS-GGA, all types of synonymous codons (RSCU > 1) prefer to end with A or U (Table 4). Commonly, it prefers A and T in plant chloroplast

Discussion
At the third codon position, it prefers A and T. Also, the preference for A and T appears at the stop codons. We found that the usage of A-ending and U-ending is generally excessive. Other than trnL-CAA and trnS-GGA, all types of synonymous codons (RSCU > 1) prefer to end with A or U (Table 4). Commonly, it prefers A and T in plant chloroplast genomes at the third codon position [37]. This universal law can differentiate chloroplast DNA from mitochondrial and nuclear DNA [37].
Advances in phylogenetic studies indicate that the chloroplast genome's evolution involves nucleotide substitutions and changes in genomic structure [37,38]. Examples of the latter include the loss of genes and introns. Previous studies have shown that introns significantly regulate gene expression and selective splicing, enhancing exogenous gene expression at particular sites in plants at specific times. It has been noted that introns can significantly stabilize transcription in some eukaryotes [13,39]. The chloroplast genomes of T. nigrans and S. parasitica contained seven intron-existing genes, including atpF, rpoC1, ycf3, rps12, petB, petD, and rpl2 genes ( Table 5). Among them, the ycf3 gene, located in the LSC, contained two introns and three exons. The rps12 gene was dedicated to trans-splicing, with the 5 exon in the LSC and the 3 exon in the IR. Comparative analysis revealed that the size of the rps12 gene was also decreased due to the loss of cis-spliced introns. However, the ability of the rpsl2 gene to code and express could remain intact; none of the gene's coding region has been degraded due to its reduced length or frameshift mutation [40].
The loss of genes often occurs during the life cycle transition from autotrophy to parasitism in plants [13,41]. Overall, all 12 species of Santalales in this study retained some photosystem genes, indicating that these species have a relatively complete photosynthetic capacity despite some loss of photosynthesis-related genes. The comparative analysis of chloroplast genomes showed that both T. nigrans and S. parasitica lost the ndh gene and infA gene in that chloroplast genome, which is consistent with general models of plastome degradation [42]. The ndh gene plays an essential role in plant photoautotrophy. Its expression marks significant plant transition evolution [43]. However, cases where the ndh gene was missing or degenerated into pseudogenes have been found in the chloroplast genomes of many parasitic plant species [13,37,38,44,45]. By comparing gene content, plastome structure, and selection pressure, Li, et al. [45] found that hemiparasitism accelerates the pseudogenization and loss of the plastid ndh gene of Orobanchaceae plant species. Genetic changes in parasitic plants of Santalales are usually characterized by pseudogenization or loss of the ndh complex gene [42].
Among the 12 species of Santalales in this study, the ndhA gene located in the SSC of S. jasminodora and the repeat gene ndhB located in the IR of the species V. minimum all degenerated into pseudogenes. More importantly, however, except for E. scandens (an autotrophic plant), all ndh genes were found missing in the other nine species of Santalales, include autotrophic plant C. manillana. These results indicate that the photosynthetic capacity of these species gradually degraded during the evolution to heterotrophy, which may also reflect the increased host dependence of these species. As for E. scandens, the ndh gene in its chloroplast genome is more intact than the other 11 Santalales species. The ndhJ, K, and C genes located in the LSC and the repeat gene ndhB located in the IR, the ndhF, D, E, G, I, A, and H in the SSC of the E. scandens chloroplast genome were retained and the chloroplast genome is the longest. These results demonstrated that the photosynthetic capacity of E. scandens was stronger than that of hemiparasites such as T. nigrans and S. parasitica, and that it is less dependent on its host. Our findings suggest that the lifestyle transition of parasitic plants is accompanied by the relaxation of chloroplast gene purifying selection [7,42]. However, the chloroplast gene of hemiparasitic plants evolution is comparatively conserved in the hemiparasitic plants. Our study supports the idea that hemiparasitic plants still reserve the ability of photosynthesis and can produce organic matter by conducting photosynthesis.

Conclusions
We used high-throughput sequencing technology to sequence the chloroplast genome sequences of two hemiparasitic species: T. nigrans and S. parasitica. The sequencing, assembly, annotation, and comparative analysis showed that the T. nigrans chloroplast genome was 121,419 bp and the S. parasitica chloroplast genome was 121,750 bp. A total of 106 genes of T. nigrans and S. parasitica were annotated, including 66 protein-coding genes, 28 tRNA genes, eight rRNA genes, and four pseudogenes. In the comparison of the chloroplast genomes of the 12 species of Santalales, E. scandens chloroplast DNA was the largest and the S. jasminodora chloroplast DNA was the smallest. All ndh genes associated with NAD(P)H dehydrogenase have become pseudogenes or have been completely lost, while some tRNA genes have been lost, a few ribosomal protein genes and ycf genes have been degraded, and the loss of these genes has significantly reduced the size of SSC and LSC of T. nigrans and S. parasitica chloroplast DNA. Phylogenetic analysis showed that 11 of the 12 species were clustered into two clades with high bootstrap support. In agreement with phylogenetic analyses, the loss of genes, the order of genes, and the structure of genes in the chloroplast genomes of these species can be assigned to six categories: Viscum (4 spp.), Osyris (Santalaceae), and Champereia (Opiliaceae) formed one clade, and Taxillus (3 spp.), Scurrula (Loranthaceae), and Schoepfia (Schoepfiaceae) formed another clade. Erythropalum (Erythropalaceae) was the most distant clade within the Santalales. Our phylogenetic relationship among the families of the 12 species of Santalales, based on the chloroplast genomes, is consistent with recently reported phylogenetic trees (e.g., Angiosperm Phylogeny Website, Version 14, 2017). Although different genes are lost in various categories, most genes related to photosynthesis are retained in the 12 species. Hence, the genetic information from chloroplast genomes accorded with observations that they are hemiparasitic plants. This study will provide information for further research about the chloroplast DNA evolution and phylogenetic and molecular ecology of the family Santalales, and our comparative genomic analyses provide a new case for the chloroplast genome evolution of parasitic plants.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.