Plastome Phylogenomics Provide Insight into the Evolution of Taxus

: The taxonomy of an ancient gymnosperm genus Taxus , with high value in horticulture and medicine, is perplexing because of few reliable morphological characters for diagnosing species. Here, we performed a comprehensive investigation of the evolutionary dynamics of Taxus chloroplast genomes and estimated phylogenetic relationships, divergence times, and ancestral distributions of Taxus species by comparing 18 complete chloroplast genomes. The variations across the chloroplast genome of different Taxus species indicated that remarkably varied genome variations across lineages have reshaped the genome architecture. Our well-resolved phylogeny supported that T. brevifolia Nutt. was basal lineages followed by the other North America lineages. Divergence time estimation and ancestral range reconstruction suggested that the Taxus species originated in North America in the Late Cretaceous and revealed that extant Taxus species shared a common ancestor whose ancestral distribution area was probably in North America and afterwards the earliest members expanded to Southeast Asia from where Chinese Taxus species originated. The predominant European species have more closer relationship with the Eastern Asian species and the speciation of Eurasia species arose from several dispersal and vicariance events in the Miocene. Genome-wide scanning revealed 18 positively selected genes that were involved in translation and photosynthesis system in Taxus , which might be related to the adaptive evolution of Taxus species. The availability of these complete chloroplast genomes not only enhances our understanding of the elusive phylogenetic relationships and chloroplast genome evolution such as conservation, diversity, and gene selection within Taxus genus but also provides excellent templates and genetic bases for further exploration of evolution of related lineages as well as for plant breeding and improvement.


Introduction
Taxus, honored as the 'living fossil of plants', is the most diverse and widespread genus within Taxaceae [1], and is one of the ancient species originating from the Late Cretaceous [2]. However, most Taxus species are either endangered or near threatened due to slow-growing, weak reproductive capacity, scattered distribution, and the illegal trade and excessive exploitation of its bark and leaves for taxol [3]. All native Taxus species in China have been listed as endangered and national first-class protected plants due to their narrow inhabit niche and decreasing population [4], and India has prohibited/restricted the export of native Taxus [5]. The International Union for Conservation of Nature's Red List of Threatened Species has recruited several species of Taxus that are threatened to different degrees, including T. floridana Nutt. ex Chapm., T. brevifolia Nutt., T. globosa Schltdl., T. contorta Griff., T. fuana Nan Li & R.R.Mill., T. chinensis (Pilg.) Rehder., and T. wallichiana Zucc. [6,7]. The lack of molecular biology research and genetic information between Taxus and related species seriously hinders the effective conservation and research of this rare and endangered species.
The taxonomic history of Taxus species has been complex and uncertain due to their similar morphological characteristics, whereas Taxus plants usually present high levels of phenotypic plasticity within species [8][9][10]. Moreover, the classifications and phylogenetic relationships between Taxus and Cephalotaxaceae and Podocarpaceae species are not fully understood. Extant Taxus plants are believed to have evolved from a common ancestor whose descendent lineages also includes the oldest recognizable Triassic fossil, Paleotaxus redivia, discovered in 200-million-year-old strata [11]. A mid-Jurassic relative is more identifiable as a member of extant Taxus, named T. jurassica Florin., closely resembling T. baccata L., T. cuspidata Siebold & Zucc., and T. brevifolia [12]. A few authors treated all previously described Taxus species as subspecies of T. baccata [13][14][15]. The monophyly of Taxus has been supported by both chemical and morphological characteristics in seeds and leaves and molecular evidence [16,17], but the phylogenetic relationships among Taxus species have been subjected to various controversies. For example, Pilger [13], Cope [18], and Farjon [19] only admitted 7-12 species in Taxus, with only five in China, whereas a phytogeographical analysis of extensive sampling of Taxus based on leaf anatomical characters resolved 24 species and 55 varieties in the genus, and China contains 16 species and seven varieties [20]. In addition, Liu et al. recognized totally 15 species/lineages in the genus based on a world-wide level of genetic and distribution exploration, which suggested that T. sumatrana (Miq.) de Laub. was assigned as T. chinensis or T. wallichiana according to their geographic position [21]. Li & Fu regarded T. yunnanensis W.C.Cheng & L.K.Fu. as a synonym of T. wallichiana, and they believed that T. wallichiana identified in FRPS as T. fuana [22]. Currently, T. fuana is a synonym of T. contorta [23,24]. Repeated ancient hybridization events were postulated as follows: T. florinii Spjut. = T. wallichiana (♀) × T. chinensis (♂), Emei type = T. chinensis (♀) × T. wallichiana (♂), and Qinling type = T. contorta (♀) × Huangshan type (♂) [23]. The Emei and Qinling type were considered as two additional cryptic species [25]. To data, the genus contains more than 10 species in many different habitats of the northern Hemisphere, covering Europe, North America, and Asia [21,[26][27][28], and these species are geographically separable rather than morphologically different. There are at least five species and one variety in Taxus that have been found within China, including T. chinensis, T. yunnanensis, T. cuspidata, T. madia Rehder., T. wallichiana, and T. chinensis var. mairei. Among these five species, T. chinensis and T. chinensis var. mairei originated from China which is one of the modern diversity centers of Taxus.
Molecular phylogenetic have partially resolved the evolutionary relationships between species and reconstructed the phylogeny of different Taxus species. Analysis of one nuclear (ITS) and five chloroplast (matK, psbA-trnH spacer trnL, rbcL, and trnL-trnF spacer) molecular markers clustered T. brevifolia and T. globosa in one single clade with T. baccata, while the endemic T. yunnanensis was included in a group with T. wallichiana, which is distantly related with the other four Taxus species in China; T. floridana is the first-branching taxon in Taxus genus, whereas T. fauna is closer to T. baccata than to other Taxus species [26]. Meanwhile, phylogenetic analysis of ten main Taxus species based on ITS region revealed that three North American species clustered a clade with high support, and T. brevifolia from the Pacific coastal showed a well-supported sister relationship to T. floridana from northwestern Florida and T. globosa from northern Mexico [29]. Plenty of studies have attempted to interpret the genetic diversity and genetic structure, species separation, phylogenetic and interspecific relationships based on nuclear microsatellite markers [30][31][32], random amplified polymorphic DNA [33], chloroplast DNA sequences, and internal transcribed spacer region of nuclear ribosomal DNA [21,26,34] markers. The next-generation sequencing, however, is more cost effective and faster to obtain a high number of gene targets or a whole genome than by traditional Sanger sequencing [35]. Although more conserved in the terms of structures and organization of gene content than nuclear genome in plants, the chloroplast DNA sequences have suffered many mutation events throughout the evolution of vascular plants, including InDels, substitutions, and inversions [36]. Therefore, it is imperative to employ the complete chloroplast genome sequences to resolve the genetic diversity and phylogenetic analysis at high taxonomical levels, and even in lower taxa [37,38].
Numerous phylogenetic studies revealed the phylogenetic origin and backbone relationships of gymnosperms based on whole chloroplast genome sequences [39,40]. Presently, many complete chloroplast genomic sequences of Taxus species have been published, and the unique structure of chloroplast genome such as gene content, genome-scale genomic rearrangements, and gene lost and gain events were identified in them [41][42][43]. However, the global pattern of variation and evolution of the whole chloroplast genomes in this genus have been ignored. Therefore, a further investigation of the phylogenetic relationships and divergence between species, chloroplast genome variations, and evolutionary dynamics of repeat sequences through all Taxus chloroplast genome sequences, is necessary to conserve and manage the germplasm resources, improve breeding, and search for favorable genes. In this study, we newly sequenced four complete chloroplast genomes of T. cuspidata, T. chinensis, T. yunnanensis, and T. wallichiana from China using Illumina technology. Furthermore, our assemblies were integrated with 14 previously assembled chloroplast genomes to yield the most comprehensive phylogenomic study of Taxus. This study aimed to (i) clarify phylogenomic relationships and estimate divergence times between Taxus species; (ii) survey the overall structural variation pattern of the whole chloroplast genome in Taxus species; (iii) investigate the evolutionary dynamics of repetitive sequences across Taxus chloroplast genomes during recent speciation; (iv) estimate the ancestral distribution and migration history of Taxus; (v) and search for Taxus chloroplast genes under positive selection. This study presents the most comprehensive phylogenomic analysis for 18 identified members within Taxus and provides a new paradigm for the investigation of plastome evolution of conifers.

Plant Materials and DNA Extraction
Healthy fresh needles of T. cuspidata, T. chinensis, T. yunnanensis, and T. wallichiana were collected from four single individuals located in Northeast Forestry University (Harbin, China), Xi'an Botanical Garden (Xi'an, China), Southwest Forestry University (Kunming, China), and Tibet Agricultural and Animal Husbandry University (Linzhi, Tibet, China), respectively. These leaves were divided into two parts and dried in silica gel and stored at −80 • C for genomic DNA extraction. Total genomic DNA was isolated by using a Dneasy Plant Mini Kit (Qiagen, CA, USA), followed by purification. Subsequently, the DNA concentration of each sample was quantified using a NanoDrop2000 spectrophotometer (Thermo Scientific, Carlsbad, CA, USA). The chloroplast genome sequences of 14 other Taxus species and Pseudotaxus chienii W.C.Cheng. were directly downloaded from NCBI, including T. brevifolia, T. globosa, T. floridana, T. phytonii, T. calcicola, T. mairei, T. canadensis, T. baccata, T. contorta, and T. fuana, and three potential cryptic eco-types, the Huangshan, Qinling and Emei type. All accession numbers are presented in Table S1.

Chloroplast Genome Sequencing, Assembly and Annotation
Shotgun genomic libraries were generated via fragmentation of qualified DNA samples. A DNA sample was randomly sheared to 250 bp fragments by using Covaris ultrasonicators (Covaris, Inc., Woburn, MA, USA), and the library was constructed by terminal repair, A-tail addition, adaptors ligation, purification, and PCR amplification. Libraries were then multiplexed and sequenced using the Illumina Hiseq 2500 high-throughput sequencing platform (Illumina Inc., San Diego, CA, USA) with 2 × 150 bp paired reads. Firstly, all the raw reads were trimmed using a CLC Genomics Workbench v9 (CLC Bio, Aarhus, Denmark) with the default parameters set. After trimming, high quality PE reads were assembled using the program MITObim v1.7 [44], with the published chloroplast genome of T. mairei (GenBank: KJ123824) [41] as a reference to obtain accurate sequences. The complete chloroplast genomes were annotated using DOGMA program [45], with the homology recognition threshold of protein-coding genes, tRNA, and rRNA coding genes of 60%, 85%, and 85%, respectively. Some genes with large variation and low homology with related species were annotated by manual correction with the homology recognition threshold of 25%-50%. The positions of start and stop codons and intron/exon boundaries were adjusted by BLAST searches by comparison with homologous genes in other chloroplast genomes of many related species, including T. mairei (KJ123824). The obtained tRNA genes were further confirmed using tRNAscan-SE [46], and an open reading frame was detected by ORF-Finder [47]. The Organellar Genome DRAW [48] was used to draw the circular plastid genome maps.

Phylogenomic Analyses
Phylogenomic analysis was implemented based on three datasets, the complete chloroplast genome sequences, a set of 82 common protein-coding genes and non-coding DNA sequences from 18 Taxus species, including four Chinese species sequenced in the current study, 14 other Taxus species, and one Pseudotaxus species used as an outgroup. The MAFFT v7.0 [49] was used to align the chloroplast genomic sequences from the finalized datasets, with the default parameters set, and adjusted manually where necessary. The best-scoring maximum-likelihood phylogenetic trees were inferred with RAxML v7.2 [50], performing 1000 bootstrap replicates. RAxML searches relied on the general time reversible (GTR) model of nucleotide substitution with the gamma distributed rate variation among sites and a proportion of invariable sites (GTR+G+I model). Occurrence of InDels and SNPs were mapped onto the derived phylogenetic trees constructed by ML analysis according to whole chloroplast genome alignment.

Inference of Divergence Time and Diversification Pattern
We also estimated the divergence time within Taxus species using Bayesian approach as implemented in BEAST program [51,52]. We tested whether a molecular clock hypothesis could be fitted to our data by employing the log-likelihood value with and without the molecular clock constraints by using MEGA v7.0 [53]. Our dataset strongly rejected a strict molecular clock (with clock, lnL = −286,819.299; without clock, lnL = −278,520.849; p < 0.001), suggesting that the evolutionary process is very complex. Therefore, we used a Yule process tree prior to model speciation and an uncorrelated log-normal relaxed-clock model to account for lineage-specific rate heterogeneity. In this model, genetic distances were transformed into absolute time (in million years) by using five fossil calibration points within the major conifer clades [54,55]. We undertook five independent Markov Chain Monte Carlo (MCMC) runs with 1,000,000 generations to confirm the convergence of the analysis, sampling every 1000 stages, and the initial 20% cycles were discarded as burn-in. The five runs were combined in LogCombiner (part of the BEAST-package). The final result was visualized in Tracer version 1.6 [56] to assess convergence and stationarity of each chain to the posterior distribution. The effective sample size (ESS) of each parameter exceeded 200 sufficient for stable estimates of parameters. The maximum clade credibility tree (MCC) was summarized using TreeAnnotator (part of the BEAST-package) with the initial 10% of trees discarded as burn-in. The derived MCC tree was visualized in Figtree version v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/, accessed on 3 August 2020). To obtain a sense of overall pattern of diversification rate change over time, a lineage-throughtime (LTT) analysis was employed to detect temporal shifts in diversification rates, as a function of evolutionary radiations within Taxus genus. The BEAST operations followed the parameters as above, and 1000 trees sampled from the converged Bayesion trees were used to generate the confidence intervals of the LTT plots.

Ancestral Area Reconstructions (AAR)
In order to reveal the origin and the historical biogeography of the hypothetical ancestor, the mechanism (vicariance/dispersal) underlying the geographically disjunct distribution of Taxus species, and the spatial patterns of biogeographical diversification within Taxus were estimated based on the Bayesian binary method (BBM) applied by RASP v4.0 [57] using the BEAST-derived chronogram of Taxus genus, with the outgroup removed from the tree. All parameters were kept at default. The geographical distribution of Taxus species was broadly categorized into four operational areas (labeled A-D) in accordance with their present distributions: (A) America, (B) Europe, (C) China, and (D) Japan. The current distribution information of Taxus species was mainly derived from our field investigation and GBIF (https://www.gbif.org/, accessed on 5 December 2020).

Genome-Wide Analyses of Genomic Structural Variants
In order to understand the full spectrum of genetic structural variation in Taxus chloroplast genome, we performed a comprehensive scanning of genome-wide structural variation based on complete chloroplast genome sequences of 18 Taxus species. The phylogeny-aware algorithm PRANK +F [58] was used to align all datasets with the parameters "-showanc -showevents +F". The phylogeny-aware algorithm performs better than the classical progressive alignment methods regarding the accuracy in InDels-rich sequences as PRANK posterior probabilities take into account the InDel events by distinguishing between the two types of events and retain the high-reliability regions with InDels. Particularly, if the "+F" option is switched on, PRANK flags already infer insertions at their sites as permanent insertions and prevent other insertions from being inferred in an overlapping position during the second round of alignment. This variant algorithm with permanent insertions outperforms the basic algorithm of PRANK provided the correct guide phylogeny and dense sampling of sequences. This indicated that InDels introduced along one branch of the phylogeny are probably introduced from an InDel event occurred in another branch, even if they overlap, and thus generates too long and gappy but potentially more accurate alignments.

Repeat Sequence Analyses
Three types of repetition were identified, including tandem, dispersed, and palindromic repeats. The minimum repeat sizes were set to 15, 20, and 30 bp for tandem, palindromic, and dispersed repeats, respectively. These three repetitive sequences were identified by first using the program DNAMAN v6.0. (Lynnon Biosoft, Vaudreuil, QC, Canada) and then manually filtering the redundant output.

Genome-Wide Scan for Protein-Coding Genes under Positive Selected and Accelerated Evolution
To explore the selective pressure of PCGs in Taxus, the CODEML program implemented in the PAML v4.9 package [59] was used to estimate the rate of non-synonymous (d N ) and synonymous (d S ) substitutions for PCG. We explored the selective pressure of all 82 chloroplast genes in Taxus genus by estimating the rate of non-synonymous (d N ) and synonymous (d S ) substitutions for all coding sequences with CODEML program [60] implemented in the PAML v4.9 package [59]. The ratio (ω) of d N to d S was used to determine the selective pressure, where ω > 1, ω = 1, and ω < 1 indicate higher positive selection pressure, neutral selection, and higher pressure of negative selection, respectively. PAL2NAL [61] was used to generate the codon-wise alignment of nucleotide sequences as the input sequences for CODEML guided by the peptide alignments. The ML phylogenetic tree based on whole chloroplast genome was used as an input tree. Before calculation, stop codons and gaps between the aligned sequences were deleted. In order to determine whether each shared coding sequence has experienced a different evolutionary force in different lineages, we assigned targeted branch(es) as the foreground branch where positive selection may have occurred and the remains served as background branches where only purifying selection or neutral evolution occurred. We adopted two-ratio models which compare one model that assumes sites to be under positive selection on the foreground branch and sites can evolve either neutrally or under purifying selection on the background branches with the null model in which all branches share the same evolutionary rate. The likelihood ratio test (LRT) with χ 2 distribution was used to evaluate the significant difference between the two models with a p < 0.05 significance threshold.

Phylogenomic Analysis
In this study, we explored evolutionary relationships among Taxus species by combining 14 published complete chloroplast genomes and the newly sequenced chloroplast genomes of T. cuspidata, T. chinensis, T. yunnanensis, and T. wallichiana. All Taxus chloroplast genomes displayed discrepant genome sizes ranging from 127,352 bp (T. florinii) to 129,752 bp (T. globosa) but fairly conserved gene content that commonly harbored an identical set of 82 annotated unique protein-coding genes, 25 tRNA genes, and four rRNA genes (Figures S1 and S2). Thus, three datasets, including whole chloroplast sequences, protein-coding genes, and non-coding region, were used to reconstruct phylogenetic trees based on ML method, with P. chienii as an outgroup. The fully resolved phylogenetic trees obtained by using the whole plastome sequences and protein-coding sequences have nearly identical topology to each other with high support values (Figures 1 and S3), whereas the tree constructed based on the non-coding sequences presented disparate topology with that based on the entire chloroplast genome and a total of 82 protein-coding genes ( Figure S4), indicating a different evolution pattern between coding and non-coding sequences. In the resulting ML phylogenies based on whole-plastome and coding regions, the four Taxus species originated from New World, T. brevifolia, T. canadensis, T. floridana, and T. globosa did not form a monophyletic clade. T. brevifolia was resolved to be the initial diverged species in Taxus genus and subsequently a sister relationship of T. floridana and T. globosa was well-supported. However, one of the unexpected findings is that the Emei type appear to be more distantly related to the basal lineage with high bootstrap values in comparison with other Old World Taxus species.
The 14 derived Taxus lineages further were divided into two main clades of which one branch was comprised of T. mairei, T. yunnanensis, T. calcicola, T. phytonii, and the Huangshan type and T. florinii, while the other included T. fuana, T. contorta, T. wallichiana, Qingling type, T. chinensis, T. baccata, T. cuspidata, and T. canadensis (Figures 1 and S3). This finding revealed a potential role of Emei type as the ancestral lineage of these clades and was indicative of multiple evolutionary origins of Chinese-specific Taxus species; however, the five Chinese species were strongly supported as a monophyletic clade with T. floridana and T. fuana in the phylogeny based on non-coding regions ( Figure S4). The topological differences were also observed when using non-coding sequences, indicating that T. floridana independently formed a monophyletic clade and failed to group with T. globosa within the New World Taxus species. Both phylogenies based on whole-plastome and coding sequences grouped T. contorta into monophyletic clade with T. wallichiana and T. fuana with 100% support, whereas the affinity of T. contorta to T. wallichiana and T. fuana was not confirmed by noncoding sequences: T. contorta formed a close relationship with Qingling type within a clade that is clearly separated from where it was expected to be. These topological differences suggested the need of more plastomes or nuclear sequences to resolve the evolutionary relationships between Taxus species.

A Genome-Wide Map of Genomic Variation across Taxus Chloroplast Genomes
Four new sequenced Taxus chloroplast genomes, together with the published chloroplast genomes, allowed a comprehensive evaluation of chloroplast genome-wi structural/single nucleotide variation (SNV), indicative of the evolutionary processes Taxus species. The SNVs, insertions, and deletions (InDels) were characterized in Tax chloroplast genomes with P. chienii as an outgroup, indicating that genomic variatio varied from one species to another. The overview of the distribution of genom structural/single nucleotide variation across Taxus chloroplast genomes reveal substantial variation but overall conservation of genome architecture across Taxus speci (Figures 2 and S5). The uneven distribution of genomic variation highlighted some fa evolving hotspots across Taxus chloroplast genomes, for example, ycf1, ycf2, accD, an some tRNA gene-rich regions. These highly variable regions can serve as the mo promising candidate for DNA barcodes in plastid genome to resolve closely relat lineages in phylogenetic analyses. In order to survey the occurrences of genomic varian along divergence branches that account for the evolution of Taxus chloroplast genom all genomic structural/single nucleotide variants were mapped onto the phylogenetic tr generated by using 18 Taxus chloroplast genome sequences with the phylogeny-awa algorithm [57], which was proven to outperform traditional approaches in resolvi

A Genome-Wide Map of Genomic Variation across Taxus Chloroplast Genomes
Four new sequenced Taxus chloroplast genomes, together with the published 14 chloroplast genomes, allowed a comprehensive evaluation of chloroplast genome-wide structural/single nucleotide variation (SNV), indicative of the evolutionary processes in Taxus species. The SNVs, insertions, and deletions (InDels) were characterized in Taxus chloroplast genomes with P. chienii as an outgroup, indicating that genomic variations varied from one species to another. The overview of the distribution of genomic structural/single nucleotide variation across Taxus chloroplast genomes revealed substantial variation but overall conservation of genome architecture across Taxus species (Figures 2 and S5). The uneven distribution of genomic variation highlighted some fast-evolving hotspots across Taxus chloroplast genomes, for example, ycf 1, ycf 2, accD, and some tRNA gene-rich regions. These highly variable regions can serve as the most promising candidate for DNA barcodes in plastid genome to resolve closely related lineages in phylogenetic analyses. In order to survey the occurrences of genomic variants along divergence branches that account for the evolution of Taxus chloroplast genomes, all genomic structural/single nucleotide variants were mapped onto the phylogenetic tree generated by using 18 Taxus chloroplast genome sequences with the phylogeny-aware algorithm [57], which was proven to outperform  Table S2). The size-frequency spectrum was consistent with observations from the nuclear genomes of both rice [62] and Arabidopsis lyrata [63] that larger InDels were less abundant than smaller ones. The size distribution of structural variations along lineages indicated that a clear excess of insertions over deletions was detected in most branches except T. florinii and the ancestral lineage leading to the Taxus clade, which suffered significant sequence loss since divergence from Pseudotaxus ( Figures 3B and S6B,C). This would account for the slight shrinkage of the Taxus chloroplast genome compared with The majority of discovered Indels in Taxus chloroplast genomes occurred on a small scale, approximately 95% of which were shorter than 100 bp, whereas 70% were smaller than 10 bp (Figures 3 and S6B,C; Table S2). The size-frequency spectrum was consistent with observations from the nuclear genomes of both rice [62] and Arabidopsis lyrata [63] that larger InDels were less abundant than smaller ones. The size distribution of structural variations along lineages indicated that a clear excess of insertions over deletions was detected in most branches except T. florinii and the ancestral lineage leading to the Taxus clade, which suffered significant sequence loss since divergence from Pseudotaxus ( Figures 3B and S6B,C). This would account for the slight shrinkage of the Taxus chloroplast genome compared with its close relative Pseudotaxus. Within Taxus species, T. brevifolia had the greatest number of both SNVs and insertions To determine the occurrence rates of these genome variations in Taxus plastomes, we time-calibrated the events of SNVs and InDels by aligning them with our time-scaled phylogeny yielded subsequently from Bayesian relaxed molecular clock method implemented in BEAST2 ( Figure 3A). Our results indicated that during the past ~29 Myr in the early Oligocene, the 18 Taxus species and their shared ancestral lineages had an average of ~8.8 insertions and ~7.5 deletions per million years, respectively ( Figure S6), whereas the single nucleotide mutation occurred at a speed of 32.8 per million years. The results also suggested that genome variations dramatically varied across lineages. Nonetheless, the estimated occurrence rates of genome variation (both structural and single nucleotide) were virtually constant along these lineages) (Figures 3A and S6A). This result was similar to the observation in the plastome or nuclear genomes of flowering plants such as rice [64,65] that the number of genomic variation events observed along different branches significantly correlated with branch lengths ( Figure S6D).
The examination of genomic positions of these predicted structural variations displayed that merely 2% of the detected InDels occurred in exons of protein-coding genes, suggesting their negative role in maintenance of the affected genes that would be constantly swept away by purifying selection during evolution. Further scrutiny of the length distribution of insertions and deletions within the 26 affected chloroplast proteincoding genes revealed that out of the all InDels were with length of multiple of three, To determine the occurrence rates of these genome variations in Taxus plastomes, we time-calibrated the events of SNVs and InDels by aligning them with our time-scaled phylogeny yielded subsequently from Bayesian relaxed molecular clock method implemented in BEAST2 ( Figure 3A). Our results indicated that during the past~29 Myr in the early Oligocene, the 18 Taxus species and their shared ancestral lineages had an average of 8.8 insertions and~7.5 deletions per million years, respectively ( Figure S6), whereas the single nucleotide mutation occurred at a speed of 32.8 per million years. The results also suggested that genome variations dramatically varied across lineages. Nonetheless, the estimated occurrence rates of genome variation (both structural and single nucleotide) were virtually constant along these lineages) (Figures 3A and S6A). This result was similar to the observation in the plastome or nuclear genomes of flowering plants such as rice [64,65] that the number of genomic variation events observed along different branches significantly correlated with branch lengths ( Figure S6D).
The examination of genomic positions of these predicted structural variations displayed that merely 2% of the detected InDels occurred in exons of protein-coding genes, suggesting their negative role in maintenance of the affected genes that would be constantly swept away by purifying selection during evolution. Further scrutiny of the length distribution of insertions and deletions within the 26 affected chloroplast protein-coding genes revealed that out of the all InDels were with length of multiple of three, suggesting strong negative selection on frameshift InDels that affected the chloroplast protein-coding genes.

Evolutionary Dynamics of Chloroplast Genome Repetitive DNA
The repetitive sequences contribute to chloroplast genome rearrangement, sequence divergence, as well as evolution [66,67]. Therefore, we explored the evolutionary dynamics of repetitive DNA sequences across 18 chloroplast genomes in order to determine their contribution to Taxus chloroplast genome variation and evolution. Overall, Taxus and P. chienii chloroplast genomes contained 395 repetitive sequences under three categories, namely tandem (Rt), palindromic (Rp), and dispersed (Rd) repeats that located within the gene spacer, coding regions, introns, and other regions (Table S2). Among them, the 282 Rt repeats, ranged from 15 to 406 bp in length, were the most abundant, accounting for 71.4% of total repeats; whereas Rp and Rd were 23.8% and 4.8% of entire repeats, respectively ( Figure S7B). These results revealed than Rt repeats were dominant and contributed the most to the Taxus chloroplast genome expansion. Although the repeats were mainly occurred within non-coding regions, a number of protein-coding genes also contained repeat sequences. Comparative genomic analysis demonstrated that the three types of repetitive DNA sequences presented different evolutionary behaviors across Taxus chloroplast genomes. To examine the rate of repeats inserted into or eliminated from the Taxus chloroplast genomes, we approximately mapped the occurrences of these repeat sequences onto the time-calibrated phylogenetic tree and obtained a rough estimation of the speed of their insertion into and/or removal from the Taxus chloroplast genomes ( Figure 4A). The identification of 77 insertion and 58 deletion events suggested a certain degree of proliferation of repeat sequences in Taxus chloroplast genome since the lineage divergence from their most common ancestor. Evolutionary dynamics of tandem repeats, represented by the abundance of copy number variation in chloroplast repeats across Taxus lineages, appeared to be more activated than Rd and Rp repeats, that mainly accumulated in the common ancestor and substantially persisted during the evolutionary history ( Figure 4A; Table S2). Both Pseudotaxus and the ancestral branch of Taxus underwent extensive loss of repetitive DNA sequences since their split. Our results also revealed that the majority of repeat insertion/deletion events occurred continually across timetabled branches, but the pace of repeat insertions or deletions differed greatly among even closely related Taxus species in their process of speciation. Comparisons of genomic variation (SNVs and InDels) intensity within Rt, Rp, Rd, and their flanking regions with genome background revealed that only the flanking regions of Rp harbored significantly high levels of InDels, whereas the flanking regions of Rt and Rd only displayed high levels of SNVs ( Figure 4B). These findings indicated that most chloroplast Rp repeats showed relatively slower substitution rates since their origin in the common ancestor of Taxus species than Rt and Rp repeats and suggested that Rt, Rp, and Rd repeats may display a different impact to accelerate the variation and evolution of Taxus chloroplast genome.

Selective Pressures in the Evolution of the Taxus Genus
To look into the molecular evolution of chloroplast protein-coding genes, all 18 Taxus chloroplast protein-coding genes were used to calculate the non-synonymous/synonymous rate ω (ω = d N /d S ). Unsurprisingly, rate heterogeneity existed among these Taxus lineages. Along the terminal branches, T. wallichiana (10.722) exhibited the highest ω value, followed by T. yunnanensis (8.772), while T. mairei (0.577) showed the smallest ω estimates. Meanwhile, T. contorta, the Qinling type, T. floridana, and T. globosa showed the relatively higher ω estimates while the remainder had intermediate values around 1 ( Figure S7). The elevation of ω ratios (here ω > 1) provides evidence of adaptation or relaxation of selective constraints of these lineages. Subsequently, positive selection signals on genes along the lineages in the phylogenetic tree were detected by using the p\optimized branch-site model to calculate the ω value changes at each codon on particular branches or clades in the Taxus phylogeny for all 82 chloroplast genes. We convincingly detected 18 genes (ropA, rpl33, rps11, rps19, petD, ycf 2, petB, ycf 1, chlN, chlL, matK, rps2, rpoC2, rpoC1, rpoB, rbcL, accD, and rps12) under positive selection by the LTR method and a p-value of 5% for statistical significance (FDR < 0.05) (Figure 5), accounting for 22% of all examined chloroplast genes.
In the T. floridana and T. globosa clade, only ycf 2 gene was under positive pressure, while ycf 2 and accD genes were positively selected in all examined clusters ( Figure 5), indicating their functional importance. These 18 positively selected genes were functionally assigned as translation genes (rpoA, rpoB, rpoC1, rpoC2, rpl33, rps2, rps11, rps12, and rps19), photosynthesis genes (petB, petD, chlL, chlN, and rbcL), and other genes (ycf 1, ycf 2, accD, and matK), indicating that the majority of genes under positive selection are relevant to translation and photosynthesis system. during the evolutionary history ( Figure 4A; Table S2). Both Pseudotaxus and the ances branch of Taxus underwent extensive loss of repetitive DNA sequences since their sp Our results also revealed that the majority of repeat insertion/deletion events occur continually across timetabled branches, but the pace of repeat insertions or deleti differed greatly among even closely related Taxus species in their process of speciati Comparisons of genomic variation (SNVs and InDels) intensity within Rt, Rp, Rd, a their flanking regions with genome background revealed that only the flanking region Rp harbored significantly high levels of InDels, whereas the flanking regions of Rt and only displayed high levels of SNVs ( Figure 4B). These findings indicated that m chloroplast Rp repeats showed relatively slower substitution rates since their origin in common ancestor of Taxus species than Rt and Rp repeats and suggested that Rt, Rp, a Rd repeats may display a different impact to accelerate the variation and evolution Taxus chloroplast genome.

Diversification History and Divergence Time Estimation in the Genus Taxus
We reconstructed diversification histories of Taxus genus by using the lineage-throughtime (LTT) plot to visualize the accumulation of lineages over time. Our visual examination of the diversification rate through time indicated an acceleration in diversification of Taxus lineages occurred since~29 Mya (ESS > 200) ( Figure 6A). Taxus is divergent at the molecular level with several distinct clades resolved in a chloroplast phylogenetic tree. We performed divergence time dating using five calibration points applied to the major conifer's clades complete chloroplast genome datasets. According to the estimations derived from the timecalibrated tree, the oldest speciation event in conifers was approximately 201.82 Mya with a credible interval ranging from 197.03 to 213.81 Mya (95% HPD) ( Figure 6B  lineage, T. brevifolia, split out from the most recent common ancestor. Subsequently, the divergence event within Taxus, which appears to have been the divergence of T. globosa/T. floridana from the other sections, occurred approximately~24.09 Mya (95% HPD, 15.12-33.92 Mya; node IV) during the Oligocene. The age estimate of the crown node for T. chinensis/T. cuspidata/T. wallichiana group and T. yunnanensis was dated to be 13.04 Mya (95% HPD, 8.13-18.58 Mya; node V). Subsequently, T. cuspidata diverged from T. chinensis/T. wallichianã 10.22 Mya (95% HPD, 6.31-15.00 Mya; node VI), and the split between T. chinensis and T. wallichiana likely occurred~4.85 Mya (95% HPD, 2.26-7.81 Mya; node VII) during the Miocene (Figure 6). Our estimates of most extant species divergence ages within Taxus were often very young, typically concentrated in the last several million years, consistent with the fast accumulation of lineages in the LTT analysis. around 1 ( Figure S7). The elevation of ω ratios (here ω > 1) provides evidence of adapt or relaxation of selective constraints of these lineages. Subsequently, positive sele signals on genes along the lineages in the phylogenetic tree were detected by usin p\optimized branch-site model to calculate the ω value changes at each codo particular branches or clades in the Taxus phylogeny for all 82 chloroplast genes convincingly detected 18 genes (ropA, rpl33, rps11, rps19, petD, ycf2, petB, ycf1, chlN,  matK, rps2, rpoC2, rpoC1, rpoB, rbcL, accD, and rps12) under positive selection by the method and a p-value of 5% for statistical significance (FDR < 0.05) (Figure 5), accou for 22% of all examined chloroplast genes. In the T. floridana and T. globosa clade, onl gene was under positive pressure, while ycf2 and accD genes were positively select all examined clusters ( Figure 5), indicating their functional importance. Thes positively selected genes were functionally assigned as translation genes (rpoA, rpoC1, rpoC2, rpl33, rps2, rps11, rps12, and rps19), photosynthesis genes (petB, petD,  chlN, and rbcL), and other genes (ycf1, ycf2, accD, and matK), indicating that the ma of genes under positive selection are relevant to translation and photosynthesis syst

Diversification History and Divergence Time Estimation in the Genus Taxus
We reconstructed diversification histories of Taxus genus by using the lin through-time (LTT) plot to visualize the accumulation of lineages over time. Our v examination of the diversification rate through time indicated an acceleratio diversification of Taxus lineages occurred since ~29 Mya (ESS > 200) ( Figure 6A). Ta divergent at the molecular level with several distinct clades resolved in a chloro phylogenetic tree. We performed divergence time dating using five calibration p applied to the major conifer's clades complete chloroplast genome datasets. Accordi the estimations derived from the time-calibrated tree, the oldest speciation eve

Ancestral Areas Reconstruction
The results from BBM of biogeographic analysis showed that extant Taxus species shared a common ancestor whose ancestral distribution area was probably North America (A) and the earliest members expanded to Southeast Asia from where Chinese Taxus species originated. Afterwards China became the diversification center that led to a wave of speciation events and harbored at least 15 extant species of Taxus. We identified within Taxus genus three dispersal and two vicariance events that contribution to the spread and speciation of Taxus genus (Figure 7). Mya (95% HPD, 6.31-15.00 Mya; node VI), and the split between T. chinensis and wallichiana likely occurred ~4.85 Mya (95% HPD, 2.26-7.81 Mya; node VII) during Miocene ( Figure 6). Our estimates of most extant species divergence ages within T were often very young, typically concentrated in the last several million years, consis with the fast accumulation of lineages in the LTT analysis.

Ancestral Areas Reconstruction
The results from BBM of biogeographic analysis showed that extant Taxus spe shared a common ancestor whose ancestral distribution area was probably North Ame (A) and the earliest members expanded to Southeast Asia from where Chinese T species originated. Afterwards China became the diversification center that led to a w of speciation events and harbored at least 15 extant species of Taxus. We identified wi

ER REVIEW 13
Taxus genus three dispersal and two vicariance events that contribution to the spread speciation of Taxus genus (Figure 7).  Figure 5B). The insert map shows the four areas ("A" "C", "D") used in the analyses and the distribution range of Taxus genus in world. Pie charts ind the proportion of the ancestral ranges. The color key identifies possible ancestral ranges at dif nodes. The black pentagram and red rhombus indicate the dispersal and vicariance e respectively.

Discussion
Unraveling the genomic relationships and evolutionary history of Taxus can he clarify the phylogeny among Taxus species and infer the origin, evolutionary history spread pattern of a given plant clade. In the present study, four complete chloro genomes of Taxus species explicitly provided new and valuable information to formidable challenges and controversies on the evolution of the genus. Subsequently combined 14 other published chloroplast genomes to explore evolutionary relation that allow drawing a full picture of Taxus chloroplast genome evolution. Our finding the foundation for future exploration of more details of the evolution of Taxus, as w the molecular identification of Taxus species.
The sequenced chloroplast genomes promise a highly reliable phylogenetic derived from the whole chloroplast genome sequences, protein-coding genes, and coding regions, rooted with P. chienii as an outgroup. The phylogenetic trees, constru based on the whole chloroplast sequences and protein-coding genes produced si topological structures but were incongruent with that generated from non-coding reg possibly because non-coding regions are more variable and provide relatively  Figure 5B). The insert map shows the four areas ("A", "B", "C", "D") used in the analyses and the distribution range of Taxus genus in world. Pie charts indicate the proportion of the ancestral ranges. The color key identifies possible ancestral ranges at different nodes. The black pentagram and red rhombus indicate the dispersal and vicariance events, respectively.

Discussion
Unraveling the genomic relationships and evolutionary history of Taxus can help to clarify the phylogeny among Taxus species and infer the origin, evolutionary history, and spread pattern of a given plant clade. In the present study, four complete chloroplast genomes of Taxus species explicitly provided new and valuable information to solve formidable challenges and controversies on the evolution of the genus. Subsequently, we combined 14 other published chloroplast genomes to explore evolutionary relationships that allow drawing a full picture of Taxus chloroplast genome evolution. Our findings laid the foundation for future exploration of more details of the evolution of Taxus, as well as the molecular identification of Taxus species.
The sequenced chloroplast genomes promise a highly reliable phylogenetic tree derived from the whole chloroplast genome sequences, protein-coding genes, and non-coding regions, rooted with P. chienii as an outgroup. The phylogenetic trees, constructed based on the whole chloroplast sequences and protein-coding genes produced similar topological structures but were incongruent with that generated from non-coding regions, possibly because non-coding regions are more variable and provide relatively more variable loci than protein-coding genes. There is a general agreement that Taxus is a monophyletic genus in the Taxaceae family supported by both morphological [68] and molecular evidence [26,34]. In the Taxus genus, T. floridana is resolved as the first diverging species based on five chloroplast (matK, rbcL, trnL, trnL-trnF spacer, and psbA-trnH spacer) and one nuclear (ITS) molecular marker. However, Fu et al. [43] suggested that T. brevifolia is the first-branching species inferred from concatenation of three locally co-linear blocks, which is congruent with our results (Figures 1, S3 and S4). Our analysis resolved T. floridana as the sister of T. globosa with 100% support, which is also supported by Gao et al. [69]. Similar to the results of Fu et al. [43], our results indicated that all New World Taxus (T. brevifolia, T. globosa and T. floridana) except T. canadensis, clustered into a clade separated from New World species. Unlike the phylogeny of Taxus inferred from non-coding sequences, the endemic T. yunnanensis clusters with Huangshan and Emei type were in a subclade and was distantly related with other four Taxus species native to China (Figures 1 and S3). However, the inferred relationship of T. yunnanensis in the study of Hao et al. [34] was quite different from our analysis based on one chloroplast and three nuclear taxadiene synthase, which supported the sister relationship between T. yunnanensis and T. wallichiana. The Huangshan type was shown to be closer to the Emei type than to the Qinling type, by contrast, the concatenated alignment of five barcodes (rbcL, matK, psbA-trnH, trnL-trnF, and ITS) dataset supports a closer relationship between the Qinling and Emei type [21]. Nonetheless, the whole chloroplast genome sequences used by Fu et al. [43] presented a nested clade among the Huangshan type, T. chinensis and T. florinii. Additionally, similar to the result of Fu et al. [43], our results supported that the Qinling type was closer to T. baccata of Europe and T. contorta of West Himalaya (Figures 1, S3 and S4). For a long time, T. floridana was treated as a variety of T. canadensis, suggesting a close relationship between these two species [70]. The examination of stomatal structure and ITS phylogenetic tree [29] in Taxus [71], however, did not group these two species into a monophyletic clade; instead, T. canadensis strongly allied with the Old World species, which was congruent with our results (Figures 1, S3 and S4). T. chinensis has been treated as a variety of either T. wallichiana [72], T. baccata [73], or T. cuspidata [13], while in our phylogenomic tree, T. chinensis and T. wallichiana formed a strongly supported clade, with T. cuspidata more distantly related.
Our ancestral distribution reconstruction analysis 100% support that the ancestor of the extant Taxus lineages originated in North America. The oldest Taxus fossils found in North America can be tracked back to the Late Cretaceous [2] and there seems to be compelling evidence for the existence of Taxus in that area since then. Based on reliable calibration point and our robust phylogenetic estimation, our molecular dating also showed that the ancestor of modern Taxus genus diverged at~67.08 Mya from its closest relative Pseudotaxus in North America (Figures 6 and 7). However, it should be noted that Pseudotaxus only occurred in China [74], which proposed an alternative scenario of origin of the ancestor leading to modern lineages of Taxus. Moreover, significant older fossils of Taxus have been discovered in Eurasia tracing back to the Lower Cretaceous [75][76][77], suggesting a more ancient origin of Taxus than what we estimated. Nevertheless, the little morphological difference in phyllotaxis and female reproductive structures increased the level of challenge to distinguish their fossil characters. One feasible explanation placed that fossil on the node of the common ancestor of Pseudotaxus and Taxus [23]. Taken together, more extensive fossil and molecular evidence of related lineages are required to confirm the North America origin hypothesis.
Our divergence analyses dated an origin for the Taxus genus back to the Upper (Late) Cretaceous, with the extant lineages diversifying in North America and Asia much later during the Oligocene/early Miocene. This age is in accordance with previous estimate based on a small number of chloroplast and nuclear gene fragments [23,26] but significantly younger than the Eocene epoch inferred by Ran et al. [78]. The timescale of divergence between Taxus and Pseudotaxus overlapped with the Cretaceous-Paleogene extinction event and a subsequent global cooling that might have significantly affected contemporaneous Taxus or Pseudotaxus populations in North America. The surviving populations in refuges might have rapidly diversified in the early Miocene in North America and migrated to Eurasia in the late Lower Miocene via the Bering land bridge. This period coincided with the intensification of the East-Asian summer monsoon and a climate optimum with warmer temperatures [79]. The China population was further divided into two major clades at 13 Mya, a period in line with uplift-driven diversification around the Hengduan Mountains and the intensification of a cooler drier climate. Our research revealed a more recent divergence between European (T. baccata) and Asian (T. cuspidata, T. contorta/T. fuana, T. chinensis, and T. wallichiana) lineages, which support the Asia-to-Europe migration of the species in the late Miocene (~6.9). The Japanese yew T. cuspidata, which mainly occupied Japan and Northeast China, diverged with a common ancestor shared with American yew T. canadensis at that time (~6.15 Mya), after which T. canadensis re-migrated to North America via the Bering land bridge. Additionally, the Qinghai-Tibetan Plateau (QTP) uplift may have been an important factor that contributed to the formation of geographical disjunct distribution of Taxus species [80], which prevented the dispersal of Taxus species between Europe and Asia [81,82] in the late Miocene, a period overlapped with our estimation of the divergence time between the European and East Asian lineages. Similarly, Möller et al. [23] also revealed that the ancestors of extant lineages of Taxus have undergone repeated inter-and intra-continental migrations and linked the diversification/speciation of Taxus with the orogeny of the Qinling and Hengduan Mountains, especially around the Sichuan basin. Similar to our results, Turgai Strait, QTP, and climate change have been proven to serve as main factors affecting the distribution of Forsythia [81]. Thus, our results provide molecular support for the vicariance hypothesis and identify it as the main source of the differentiation in Taxus, upon which, some taxonomists classify the Taxus species according with their geographical distribution [10]. Overall, climate changes have likely provided an opportunity for accelerated diversification of the lineages [83].
The taxonomy of Taxus has been a contentious issue due to the few credible morphological characters for delineating species. In the present study, we successfully used complete chloroplast genomes to reserve of the phylogenetic history of 18 described extant Taxus species, and to produce the first sequence-based map of Taxus chloroplast genomic structural variation and site substitution in the course of recent plant speciation events. Based on phylogenetic reconstruction, molecular dating, ancestral range reconstruction, and fossil records, our study reveals that Taxus were estimated to have originated during the Cretaceous in North America with evidence for rapid diversification in Eurasia coinciding with several geological events~29 Mya. Our result also supports that the Taxus genus is monophyletic and three North American species do not form a clade and together are inferred as the earliest diverging lineages; the predominantly European species have a closer relationship with the Eastern Asian species.
Furthermore, we estimated the average occurrence rate on average to be~8.8 insertions and~7.5 deletions per Myr in the Taxus genus, and we found that the number of insertions and deletions is nearly equivalent among Taxus chloroplast genomes. Comparative genomic analysis of repeat sequences suggests that the Rt repeat was dominant and contributed the most to the Taxus chloroplast genome expansion. Finally, we detected a total of 18 genes under positive selection by accomplishing genome-wide scanning. Moreover, the variation in repetitive sequences is shared among the species of Taxus, supporting that they are effective molecular markers for species identification and discovery in Taxus genus.

Conclusions
Many ecological and geographical variants of Taxus species have been formed in the process of long-term effects of the geographic environment and anthropogenic influence. In order to protect and adopt these important species efficiently, greater knowledge of the genetic background and evolutionary dynamics of Taxus is needed. In our study, we reconstructed the phylogenetic history and investigated the genomic variations of Taxus species based on 18 whole chloroplast genomes. The phylogenetic tree supported that T. brevifolia was basal lineages followed by the other North America lineages. We also found evidence that the Taxus species originated in North America in the Late Cretaceous with evidence for rapid diversification in Eurasia coinciding with several geological events, and the earliest members expanded to Southeast Asia. In addition, we found that the number of insertions and deletions is nearly equivalent among Taxus chloroplast genomes. Genomewide scanning revealed 18 positively selected genes that were involved in translation and photosynthesis system in Taxus, which might be related to the adaptive evolution of Taxus species. These complete chloroplast genome sequences provide valuable resources for investigating the origin, evolution, and plant adaptation, and facilitate future species distinguishing and conservation biology of this ancient globally distributed genus.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/f13101590/s1, Figure S1: Gene maps of the 18 Taxus species chloroplast genomes. Genes inside the circle are transcribed in a clockwise direction, and genes outside of the circle are transcribed counterclockwise. Genes belonging to different functional groups are marked in different colors. Dashed area in the inner circle indicates GC content of the chloroplast genome; Figure S2: Visualization of sequence alignment of the 18 chloroplast genomes, with P. chienii as an outgroup; Figure S3: ML phylogeny of Taxus inferred from the concatenated sequences of 82 genes of chloroplasts using P. chienii as an outgroup; Figure S4: ML phylogeny of Taxus inferred from the concatenated sequences of non-coding sequences of the chloroplast genome using P. chienii as an outgroup; Figure S5: A map of chloroplast nucleotide variation across the Taxus choloplast genomes; Figure S6 Figure S7: The ω (d N /d S ) of chloroplast protein-coding genes in Taxus species using P. chienii as an outgroup. Table S1 Accession numbers of Taxus species used in our study. Table S2: Comparisons of repeat sequences across the Taxus choloplast genomes using P. chienii as outgroup.
Author Contributions: Conceptualization, X.L. and X.J.; software, S.F.; formal analysis, S.F.; investigation, H.Z.; writing-original draft preparation, X.J.; writing-review and editing, S.F. and H.Z.; funding acquisition, X.J. All authors have read and agreed to the published version of the manuscript.
Funding: This research was funded by Shaanxi Forestry Science and technology Innovation Project (SXLK2021-01-03).