Phylogenetics of Taxus Using the Internal Transcribed Spacers of Nuclear Ribosomal DNA and Plastid trnL-F Regions

Taxus is a genus of trees and shrubs with high value in horticulture and medicine as a source of the anticancer drug paclitaxel. The taxonomy of the group is complex due to the lack of diagnostic morphological characters and the high degree of similarity among species. Taxus has a wide global geographic distribution and some taxonomists recognize only a single species with geographically defined subgroups, whereas others have described several species. To address these differences in taxonomic circumscription, phylogenetic analyses were conducted on DNA sequences using Maximum Likelihood, Bayesian Inference and TCS haplotype networks on single and combined gene regions obtained for the nuclear ribosomal ITS region and the plastid trnL intron and trnL-F intergenic spacer. Evidence is presented for the sister group status of Pseudotaxus to Taxus and the inclusion of Amentotaxus, Austrotaxus, Cephalotaxus and Torreya within Taxaceae. Results are consistent with the taxonomic recognition of nine species: T. baccata, T. brevifolia, T. canadensis, T. cuspidata, T. floridana, T. fuana, T. globosa, T. sumatrana and T. wallichiana, but evidence is found for less species distinction and considerable reticulation within the T. baccata, T. canadensis and T. cuspidata group. We compare the results to known taxonomy, biogeography, present new leaf anatomical data and discuss the origins of the hybrids T. ×media and T. ×hunnewelliana.


Introduction
Morphological differences between species of Taxus L. are slight and individuals within species have high levels of phenotypic plasticity, their morphology varying considerably with the environment [1]. This often leads to mis-identification and problems with classification using traditional taxonomic methods, which rely solely on morphological characters. Indeed, the delimitation of species in Taxus has been a long-standing taxonomic problem. Pilger [2] and Elwes and Henry [3] classified Taxus as monotypic with several geographical subspecies, a view supported by Dempsey and Hook [4] who provided evidence based on needle morphological variation and chemical characteristics. Furthermore, Dempsey [5] conducted a morphological and phytochemical analysis to investigate intra-and inter-generic relationships of Taxus and related genera, and concluded that few characters could be used to distinguish among Taxus species, but found that a clear distinction between the sister taxa Cephalotaxus and Torreya could be resolved.  [10]. Groupings of ITS sequences in relation to the biogeography of Taxus are also shown. Blue, red and purple ellipses represent phylogenetic groupings from the TCS network analysis ( Figure 4).
The importance of species delimitation and classification of Taxus goes beyond taxonomy, taxonomic identification for horticulture and comparative biology. The genus comprises members that synthesize the anti-cancer taxane drug paclitaxel (Taxol) and incorrect species identification can hinder cultivation and drug production efforts [4,11]. All Taxus species in this study produce paclitaxel in varying amounts [4,11]. Phytochemical studies, examining plants collected from different regions of the world, require a stable nomenclature and species identification. Furthermore, an accurate phylogenetic reconstruction is required to infer how paclitaxel may have evolved and synthesized, which may have biotechnological implications.
Accurate species identification is also required for conservation of Taxus species and forests [12]. According to the International Union for Conservation of Nature (ICUN) Red List, several species of Taxus are threatened to different degrees [13][14][15][16][17][18]. Taxus brevifolia and Taxus mairei (=T. wallichiana var. mairei (Lemée and H.Lév.) L.K.Fu and Nan Li) are classified as Near Threatened, and Vulnerable, respectively, with their numbers currently decreasing. A major part of this decline is attributed to logging [17] and the harvesting of bark for paclitaxel production, although this exploitation has largely stopped, following the development of a semi synthetic process to produce paclitaxel from T. baccata. Taxus wallichiana, Taxus globosa, Taxus chinensis (=T. wallichiana var. chinensis (Pilg.) Florin), Taxus fauna and Taxus contorta Griff. (=T. wallichiana ssp. contorta (Griff.) Silba) are all classified as Endangered [13,15,16,19]. This is due to the over exploitation of T. walliachiana and T. chinensis for paclitaxel production. Deforestation has caused populations of Taxus globosa to decline; T. globosa is not yet exploited for paclitaxel production. Taxus fuana according to the Red List has become endangered due to over exploitation associated with medical use along with over-collection for fuel and fodder. Shah et al. [20] observed that the populations of T. fuana were declining, mainly due to human pressure from habitat destruction, deforestation and over-exploitation for fuel, fodder, timber  [10]. Groupings of ITS sequences in relation to the biogeography of Taxus are also shown. Blue, red and purple ellipses represent phylogenetic groupings from the TCS network analysis (Figure 4).
The importance of species delimitation and classification of Taxus goes beyond taxonomy, taxonomic identification for horticulture and comparative biology. The genus comprises members that synthesize the anti-cancer taxane drug paclitaxel (Taxol) and incorrect species identification can hinder cultivation and drug production efforts [4,11]. All Taxus species in this study produce paclitaxel in varying amounts [4,11]. Phytochemical studies, examining plants collected from different regions of the world, require a stable nomenclature and species identification. Furthermore, an accurate phylogenetic reconstruction is required to infer how paclitaxel may have evolved and synthesized, which may have biotechnological implications.
Accurate species identification is also required for conservation of Taxus species and forests [12]. According to the International Union for Conservation of Nature (ICUN) Red List, several species of Taxus are threatened to different degrees [13][14][15][16][17][18]. Taxus brevifolia and Taxus mairei (=T. wallichiana var. mairei (Lemée and H.Lév.) L.K.Fu and Nan Li) are classified as Near Threatened, and Vulnerable, respectively, with their numbers currently decreasing. A major part of this decline is attributed to logging [17] and the harvesting of bark for paclitaxel production, although this exploitation has largely stopped, following the development of a semi synthetic process to produce paclitaxel from T. baccata. Taxus wallichiana, Taxus globosa, Taxus chinensis (=T. wallichiana var. chinensis (Pilg.) Florin), Taxus fauna and Taxus contorta Griff. (=T. wallichiana ssp. contorta (Griff.) Silba) are all classified as Endangered [13,15,16,19]. This is due to the over exploitation of T. walliachiana and T. chinensis for paclitaxel production. Deforestation has caused populations of Taxus globosa to decline; T. globosa is not yet exploited for paclitaxel production. Taxus fuana according to the Red List has become endangered due to over exploitation associated with medical use along with over-collection for fuel and fodder. Shah et al. [20] observed that the populations of T. fuana were declining, mainly due to human pressure from habitat destruction, deforestation and over-exploitation for fuel, fodder, timber and farming. The most endangered species is Taxus floridana, which is listed as Critically Endangered [14].
Molecular phylogenetic approaches offer the potential to better understand and resolve the relationships within taxonomically difficult groups including Taxus and a number of studies exist that provide insight into the evolution of the genus and its relatives. Chaw et al. [21] and Cheng et al. [22] used DNA sequences of 18S ribosomal DNA, internal transcribed spacer regions of nuclear ribosomal DNA and matK plastid DNA to clarify the phylogenetic position of Taxaceae in the gymnosperm order Taxales. There is also evidence for the monophyly of Taxus [21,23], but the relationships of its species remain understudied and not fully resolved [10,[21][22][23]. Extant yews are believed to have evolved from a group that includes the fossil, Paleotaxus redivia [24]. Triassic P. redivia existed over 200 million years ago and is believed to be the oldest yew according to fossil records [24]. A mid-Jurassic relative (140 myr old) is said to be more recognizable as a member of Taxus, and hence, named T. jurassica [25]. A Quaternary fossil yew, T. grandis, is probably a synonym of T. baccata [26].
Some other work has examined hybridization in the genus. Collins et al. [1] examined species distinction of Taxus baccata, T. canadensis, and T. cuspidata using Randomly Amplified Polymorphic DNA fingerprinting (RAPD) and DNA sequences and undertook an analysis of their reputed hybrids. They reported polymorphism in the plastid trnL-F region that could be used for species identification in combination with RAPD and found three different plastid DNA haplotypes [1]. They confirmed the hybrid origin and parentage of the Taxus hybrids (T. cuspidata × T. canadensis = T. ×hunnewelliana and T. baccata × T. cuspidata = T. ×media) using RAPD fingerprinting.
The aim of this study was to examine the phylogenetic relationships within Taxus by comparing nucleotide sequences obtained from the internally transcribed spacer region (ITS) of 18S-26S nuclear ribosomal DNA and the trnL intron and the trnF intergenic spacer region of plastid DNA (trnL-F). These regions have been used extensively in systematic studies of other groups of plants for investigating relationships at different taxonomic levels [27][28][29][30][31][32][33] and have also been applied to limited samples of Taxus. Representatives of Austrotaxus, Amentotaxus, Cephalotaxus, Pseudotaxus, and Torreya were also included to assess inter-relationships in Taxaceae and related families (Cephalotaxaceae and Amentotaxaceae). Cephalotaxus was sister to Taxus in Cheng et al. [22] and Hao et al. [33] and Taxaceae comprises Amentotaxus, Taxus and Torreya [7]. Cephalotaxus is sometimes included in Taxaceae [34]. Podocarpus belongs to Podocarpaceae [35]. A large-scale phylogenetic analysis of the group, including all currently accepted Taxus species according to the Plant List [9], was conducted and we also examined evidence for evolutionary reticulation using network analyses and conducted adetailed assessment of the putative hybrid taxa, T. ×media and T. ×hunnewelliana.

Specimens
Fresh plant material was obtained from the National Botanic Garden, Glasnevin, Ireland. DNA samples were also available from the Trinity College Dublin (TCD) DNA Bank. Voucher specimens were kept for each sample, dried and stored in the TCD Herbarium. Some ground and dried leaf samples were also available from our previous work on the genus (Table 1).

DNA Extraction and Sequencing
DNA was extracted from 0.05-0.075 g ground leaf material or 0.075-0.1 g of fresh material using a modified hot CTAB method of Doyle and Doyle [36][37][38]. DNA was precipitated using 100% isopropanol, pelleted and washed with 70% ethanol and purified using the JETquick Spin Columns (GENOMED Gmbh, Lohne, Germany) following the manufacturer's instructions. DNA was then stored in TE buffer (10 mM Tris/HCL, 1 mM EDTA, pH 8.0) at −80 • C until required.
The forward and reverse primers of Sun, Skinner [28] were used for amplification and sequencing of the ITS region. The internal primer 5.8 of Liston et al. [39] was also used as a sequencing primer because of the long length of the amplicons. The trnL intron and the trnF spacer (hereafter the trnL-F region) were amplified and sequenced as one segment using primers "c" and "f" of Taberlet et al. [27] and internal trnL-F primers "e" and "d" when necessary. PCRs for both regions were carried out in 12.5 µL reactions using BIOLINE Biomix (Bioline Reagents, London, UK). Both, the trnL-F and ITS PCR amplifications were prepared using 4.95 µL ultrapure water, 6.25 µL Biomix, 0.5 µL (10 pmol) forward and reverse primers and 0.3 µL of column cleaned total DNA (ca. 100 ngµL −l ). The reaction conditions for trnL-F were as follows: denaturation at 95 • C for 1 min 30 s followed by 30 cycles of 45 s at 95 • C, 45 s at 50 • C, 2 min at 72 • C and a final extension at 72 • C for 7 min in an Applied Biosystems Verti 96 well thermal cycler. 3 µL PCR products were run on a 1.2% agarose gel to check for amplification. Successfully amplified DNA fragments were purified using the ExoSap method. 0.3 µL of exonuclease (New England Biolabs, MA, USA; 20 U/µL), 2 µL of shrimp alkaline phosphatase (New England Biolabs; 1 U/µL) and 7 µL of sterile ultrapure water was added to 5 µL of PCR product and incubated at 37 • C for 30 min followed by 82 • C for 20 min. Purified PCR products were then sequenced using BigDye Terminator (Applied Biosystems, CA, USA) cycle sequencing and an Applied Biosystems 3130 xl Genetic sequencer according to the manufacturers protocol with the same primers used for initial amplification.
Some samples of both putative hybrids and non-hybrids required cloning due to the heterogeneity of the PCR product. Cloning was performed using a Thermo Scientific CloneJET PCR cloning kit (Fermentas, Vilnius, Lithuania). The PCR product was inserted into the pJET1.2/blunt cloning vector that was then transformed into E. coli cells. The cells were incubated and grown overnight at 37 • C. Eight single colonies where chosen randomly from the agar plate and a PCR was performed on each colony using the same primers as the initial pre-cloning amplification. A small part of the colony was picked directly from the agar and placed directly in the reaction using a sterile pipette tip. The parameters for the PCR were the same as above with the exception that there was an additional 10 min pre-melt at 94 • C and 10 min final extension at 72 • C. The cloned PCR products were purified using JETquick spin columns and sequenced as described above.

Sequence Analysis and Phylogenetic Reconstruction
To obtain a contiguous sequence for the target DNA region, forward and reverse sequence reads were assembled in Geneious Pro 11.1.4. (Biomatters Ltd., Auckland, NZ). The sequences were aligned in Geneious using highest sensitivity, either MUSCLE or Geneious algorithms with default settings. The sequences were then manually aligned if necessary. The aligned matrix was imported into MEGA 7 [40] for Maximum Likelihood (ML) phylogenetic analyses [40] and also MrBayes [41] for Bayesian Inference (BI) phylogenetic tree reconstruction.
For the ML analyses, the best fit substitution model was determined by the Model Selection function in MEGA 7 [40] and was found to be the Tamura 3-parameter model (T92) with gamma distributed rate heterogeneity and estimated proportion of invariant sites (G+I) for both the ITS and the trnL-F gene regions (and for the combined analyses). ML was performed in MEGA 7 with 1000 replicates of random sequence addition and nearest neighbor interchange (NNI) branch swapping. Initial tree(s) for the heuristic search were obtained automatically by applying Neighbor-Joining and BioNJ algorithms to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach, and then selecting the topology with superior log likelihood value. Bootstrap support (BS) was calculated from 1000 replicates with the same settings as the initial search following [42].
Bayesian Inference of phylogeny was performed using MrBayes version 3.2.6 [41] for each gene region separately and also the combined matrix of all DNA sequences. The T92+G +I best-fit nucleotide substitution model was used as determined for the ML analyses above. Four parallel Markov chain Monte Carlo (MCMC) chains were run for 25,000,000 generations with trees sampled every 1000 generations, and 25% of trees were discarded as burn-in.

Leaf Impressions
To help with the taxonomic identification of samples, leaf impressions were taken from a selection of samples in the field to visualize the rows of stomata, according to the method of Sarvella et al. [45]. Clear nail varnish and Sellotape was used to create an impression of the abaxial side of the leaf on a slide. The leaf impressions were examined and photographed under a stereomicroscope Leica DM500 at 10× and 20× magnification.

Results
The aligned trnL intron +trnL-F intergenic spacer matrix had 132 sequences and, with unalignable regions removed, was 486 bp long. Phylogenetic analyses using BI and ML (Figure 2 Figure S3). Some species groupings are evident within Taxus but these are not strongly supported.
of Pseudotaxus to Taxus is strongly supported when the tree is mid-point rooted or rooted on any of the remaining genera. Despite the high support for the monophyly of genera, the relationships among genera are not well-supported, especially Austrotaxus, Amentotaxus, Cephalotaxus and Torreya. Phylogenetic analysis of a reduced trnL + trnL-F dataset, including only Taxus and Pseudotaxus was undertaken to examine infrageneric patterns in Taxus (Supplementary Figure S3). Some species groupings are evident within Taxus but these are not strongly supported. In contrast, the phylogenetic analyses with nrITS provide well-resolved and better-supported trees ( Supplementary Figures S4 and S5). The ITS matrix with 119 species was 1150 bp long with 333 segregating sites. A combined analysis was therefore undertaken with the combined ITS, trnL intron and trnL-F sequences using both ML and BI (86 sequences, matrix 1511 bp long and 361 segregating sites). The BI tree is shown in Figure 3 with the bootstrap values from the ML analysis given below the branches and BI posterior probabilities above the branches. The combined tree shows that Taxus brevifolia is sister to T. globosa and T. floridana (BS = 86; PP = 1). Taxus fuana groups with T. contorta (=T. wallichiana ssp. contorta) (BS = 99; PP = 1). Taxus wallichiana is resolved as monophyletic, but its varieties T. wallichiana var. mairei, var. chinensis and var. wallichiana are not monophyletic, although, individuals within variety do generally group together and there is not firm support from PP or BS In contrast, the phylogenetic analyses with nrITS provide well-resolved and better-supported trees ( Supplementary Figures S4 and S5). The ITS matrix with 119 species was 1150 bp long with 333 segregating sites. A combined analysis was therefore undertaken with the combined ITS, trnL intron and trnL-F sequences using both ML and BI (86 sequences, matrix 1511 bp long and 361 segregating sites). The BI tree is shown in Figure 3 with the bootstrap values from the ML analysis given below the branches and BI posterior probabilities above the branches. The combined tree shows that Taxus brevifolia is sister to T. globosa and T. floridana (BS = 86; PP = 1). Taxus fuana groups with T. contorta (=T. wallichiana ssp. contorta) (BS = 99; PP = 1). Taxus wallichiana is resolved as monophyletic, but its varieties T. wallichiana var. mairei, var. chinensis and var. wallichiana are not monophyletic, although, individuals within variety do generally group together and there is not firm support from PP or BS for the lack of monophyly for these varieties. The relationships of T. baccata, T. canadensis, T. cuspidata are not well resolved and there is no evidence for their monophyly. for the lack of monophyly for these varieties. The relationships of T. baccata, T. canadensis, T. cuspidata are not well resolved and there is no evidence for their monophyly. The TCS networks for both ITS (Figure 4) and trnL + trnL-F (Supplementary Figure S6) support the BI and ML analyses of these genes, and again show the non-distinction of T. baccata, T. canadensis, T. cuspidata. Other network building methods Minimum spanning network and Median joining network [44,[46][47] analyses showed the same patterns (data not shown). The networks support the clear separation of T. wallichiana, on one side of the network, and an unresolved group of T. baccata, T. canadensis and T. cuspidata, on the other side. The other species are positioned between these groups. The ITS sequence haplotypes (numbers in Figure 4) are largely species specific except haplotype 1, 2, 11. Haplotype 2 includes T. floridana and T. globosa, haplotype 11 includes T. cuspidata and T. ×media and haplotype 1 is the most common sequence type including representatives from T. baccata, T. canadensis and T. cuspidata.  Figure S6) support the BI and ML analyses of these genes, and again show the non-distinction of T. baccata, T. canadensis, T. cuspidata. Other network building methods Minimum spanning network and Median joining network [44,46,47] analyses showed the same patterns (data not shown). The networks support the clear separation of T. wallichiana, on one side of the network, and an unresolved group of T. baccata, T. canadensis and T. cuspidata, on the other side. The other species are positioned between these groups. The ITS sequence haplotypes (numbers in Figure 4) are largely species specific except haplotype 1, 2, 11. Haplotype 2 includes T. floridana and T. globosa, haplotype 11 includes T. cuspidata and T. ×media and haplotype 1 is the most common sequence type including representatives from T. baccata, T. canadensis and T. cuspidata.
Sequence heterogeneity was detected in the uncloned ITS PCR products/sequences of T. ×media and T. ×hunnewelliana. Polymorphisms were detected at several sites and these can be mapped to the corresponding bases in their parental sequences ( Figure 5). For example, T. ×media has both a C and a G at position 142. This polymorphism is explained by the presence of the C in one putative parent (T. cuspidata) and G in the other parent (T. baccata). Similar polymorphisms can be seen for T. ×hunnewelliana.     Leaf anatomical characters were also recorded to assess their utility in species differentiation. The results were inconclusive (Supplementary Table S2) as Farjon and Spjut's keys lead to different results. For example, sample P14 which is labelled as T. brevifolia was keyed in situ to T. cuspidata, using Farjon [35]. However, using Sput's key and the stomata impressions it was keyed to T. recurvata, which according to The Plant List [9] is a synonym of T. baccata. Leaf anatomical characters were also recorded to assess their utility in species differentiation. The results were inconclusive (Supplementary Table S2) as Farjon and Spjut's keys lead to different results. For example, sample P14 which is labelled as T. brevifolia was keyed in situ to T. cuspidata, using Farjon [35]. However, using Sput's key and the stomata impressions it was keyed to T. recurvata, which according to The Plant List [9] is a synonym of T. baccata.

Taxaceae Phylogeny
Our results show support for the recognition of six genera, Amentotaxus, Austrotaxus, Cephalotaxus, Pseudotaxus, Taxus and Torreya in Taxaceae, as each of the genera are clearly resolved as monophyletic ( Figure 2). This finding supports the taxonomic treatment of Christenhusz et al. [48] who combined Cephalotaxaceae and Amentotaxaceae with Taxaceae to include 28 species in the same six genera. It is also consistent with the phylogenetic study of Price [49], based on rbcL and matK that showed Taxaceae to be monophyletic when Cephalotaxus and Amenotaxus are included. Other authors have chosen to separate these genera into three families. For example, Hao et al. [33] used a phylogenetic analysis of the sequences of five chloroplast (matK, rbcL, trnL, trnL-trnF spacer) and one nuclear molecular marker region (ITS), both individually and in combination, to support the division of the species into three allied families, Taxaceae, Cephalotaxaceae and Amentotaxaceae.
Pseudotaxus and Taxus are closely related sister genera with their only known morphological distinction being the difference in colour in the stomatal bands and aril. Pseudotaxus has white arils [50] and is native to south eastern China (north Fujian, north Guangdong, Guangxi, Huan, Jiangxi and Zhejiang) [51][52][53]. The sister status of Pseudotaxus to Taxus is also strongly supported in this study but there is not enough support for the formulation of an infra-generic classification of Taxaceae. Elpe et al. [54] conducted a phylogenetic study, which fully supported Cephalotaxus as a sister group to Taxaceae. Within Taxaceae, two tribes are supported as monophyletic, Taxeae and Torreyeae. Taxeae consists of Austrotaxus, Pseudotaxus and Taxus, while Torreyeae is comprised of Amentotaxus and Torreya. Ghimire and Heo [55] adopting a cladistic approach to investigating the Taxaceae, which also found that Pseudotaxus is a sister of Taxus with high bootstrap support.

Taxus Phylogeny
The phylogenetic and network analyses reported here were based on the nuclear ribosomal ITS and plastid trnL + trnL-F regions, which have been used in the past to investigate the phylogenetics of the genus but with lower sampling. Li et al. [10] used sequences of the plastid trnL-F regions to show genetic diversity among some Taxus species. However, their sampling was limited to one Pseudotaxus chienii sample and 14 Taxus samples (T. baccata, T. brevifolia, two T. canadensis, T. chinensis, T. chinensis var. mairei, two T. cuspidata, T. globosa, T. ×hunnewelliana, T. ×media, T. wallichiana). Furthermore, Shah et al. [56] used sequences of ITS and trnL-F along with principal component analysis to determine the taxonomic and geographical boundaries between Taxus species, but they only included T. baccata, T. wallichiana and T. fuana. Although, considerable plastid variation has been recorded within, and among species, in other groups of plants [57,58], we found that the plastid trnL-F region provided little molecular variation for species differentiation.
The phylogenetic analyses with nrITS and combined analyses with plastid DNA reported here provide most resolution and support for groupings (Figures 3 and 4; Supplementary Figures S4 and S5) and indicates that Taxus brevifolia groups with T. globosa and T. floridana and that Taxus fuana groups with T. contorta (=T. wallichiana ssp. contorta). Taxus wallichiana is resolved as monophyletic but its varieties T. wallichiana var. mairei, var. chinensis and var. wallichiana are not monophyletic, although individuals within variety do generally group together. Taxus baccata, T. canadensis, and T. cuspidata are closely related but are not well resolved. There is little evidence for their monophyly, except for one group of T. canadensis (in the ITS tree). The networks support the clear separation of T. wallichiana from an unresolved group of T. baccata, T. canadensis and T. cuspidata on the other side. The other species are found between these groups. The ITS sequence haplotypes are largely species specific except sequence 1, 2 and 11. Sequence 11 includes T. cuspidata and T. ×media, sequence 1 is the most common haplotype including representatives from T. baccata, T. canadensis, T. cuspidata and sequence 2 includes T. floridana and T. globosa.

Species Delimitation
Some authors have argued that all Taxus should be combined into a single species [2][3][4][5], but our results, combined with morphological and geographical evidence, support the division of Taxus. If we compare the species distribution map with the TCS network (Figure 1), we can see that it is possible that the groupings can partially be explained by historical biogeographical processes, including continental drift and realms. The TCS network groups T. baccata, T. canadensis, T. cuspidata and hybrids together, and these species grow in broadly similar biomes and latitudes. Taxus brevifolia, T. floridana and T. globosa are closely related in the TCS network and are all North American. One might expect that T. canadensis would be included in this group. However, if we take continental drift into account, it is not surprising that T. canadensis groups closely with T. baccata, as their separation could have been caused by vicariance as North America separated from Eurasia during the Cenozoic era. Similar patterns are known for other tree groups, such as Platanus [59]. Taxus wallichiana and T. sumatrana group together and are Asian, and Indonesian, respectively. It is possible that they were formed via allopatric speciation from a common ancestor, such as T. wallichiana diversifying into Indonesia to become T. sumatrana. Alternatively, this pattern could be explained by long distance dispersal (e.g., via zoochory) and subsequent speciation. Female yews produce fruit which are consumed by birds and disperse the seed intact in their faeces [60]. The fruits are eaten by several birds including winter flocking members of the thrush family as well as being hoarded and eaten by rodents [61]. However, Lavabre and García [62] did a study of the seed dispersal patterns of T. baccata across Spain and showed that the spatial distribution of the seeds in the landscape was heterogenous with the majority of the seeds consistently dispersed into forested microhabitats and almost none outside the forest. The results suggest that this generalized spatially restricted dispersal contributes to the lack of population range expansion. This could explain the geographical separation between T. brevifolia and T. canadensis by the American prairies; and T. baccata with T. cuspidata by the European and Russian steppes. A circumpolar distribution map of Taxus species from Hultén and Fries [63] indicates interglacial records of Taxus on the European steppes. These fossil records could show a link between T. baccata and T. cuspidata.
Taxus baccata, T. canadensis and T. cuspidata are problematic to differentiate using DNA sequence evidence alone but are sufficiently distinct in morphology and geographical distribution to merit species status. Other studies have provided evidence for the species status of several taxa [64]. For example, Shah et al. [56] strongly supported the distinctness of T. baccata, T. wallichiana and T. fuana. Spjut [8] and Spjut [65] classified 24 species and 55 varieties into three groups according to differences in leaf epidermal and stomatal features recognizing; (1) a Wallichiana group with subgroups wallichiana and chinensis; (2) a Baccata group with subgroups baccata and cuspidata and (3) a Sumatrana group (not divided). Our DNA studies and assessments of leaf anatomy (Supplementary Table S2; Supplementary Figure S7a-e) showed support for the Groups 1 and 2 of Spjut, but not Group 3. Some of the problems with the monophyly of species in the ITS tree could be explained by difficulties in taxon identification on the basis of morphology. Thus, a species identified as T. canadensis on the basis of morphology might actually be a T. baccata on the basis of DNA or anatomical evidence. A comprehensive paper on DNA barcoding in Euroasian Taxus has been published by Liu et al. [64], which highlights the use of both, ITS and trnL-F regions for correct taxon identification.
A subsample of the Taxus samples collected in the National Botanic Gardens, Glasnevin, Ireland, were identified to species, in situ, according to the key provided by Farjon [35]. This was very difficult to do as the different species all look very similar, due to phenotypic plasticity. For example, one of Farjon's characters for identification is the leaves on the lateral branchlets arranged or not arranged in a V-formation. I It was very difficult to assess some samples in this field as the V-formation was not very convincing in some specimens. Other morphological characters frequently used to identify Taxus species, include leaf shape and size, leaf buds and scales and several leaf anatomical characters [6,35,65]. Spjut [8], Spjut [65] used leaf anatomical characters to assist in species identification. The leaf samples were taken in our studies and epidermal cell patterns, and rows of stomata and papillae were visualized using cellulose acetate impressions. These characters were used in association with the key of Spjut [65] to identify the species. However, it is very easy to misinterpret Spjut's key, as phrases were used which could be easily misinterpreted, such as "papillae are nearly medial" and leaves are "usually revolute". These are not distinct traits so could be interpreted differently from person to person. Also, in some samples, the midrib was mostly smooth with papillae only present in parts of the midrib. Elpe et al. [54] developed a new identification key, based on leaf anatomical characters, using fluorescence microscopy. They found the presence of papillae on the abaxial midrib and on the adaxial leaf surface of T. brevifolia, to be a useful tool to separate T. brevifolia, T. floridana, T. globosa and T. wallichiana from other species. However, no differences were found between species, which had a papillose midrib, nor species which lacked this character. The key they produced does group T. baccata with T. canadensis and T. cuspidata, which further confirms our results. More data is needed for some species and this is important for conservation and utilization through horticulture or medicine. Therefore, we can conclude that it is insufficient to rely solely on morphological or anatomical characters for taxonomic identification of Taxus species and that those data need to be supplemented with DNA sequencing for the highest accuracy in species delimitation.

Hybridization
A further complication in phylogenetic reconstruction and species identification is ITS copy repeat heterogeneity. Repeat units of nrDNA are typically homogenized by concerted evolution, so that only one predominant copy is present [31,66,67]. This can cause complications in the interpretation of ITS sequences from closely related taxa. For example, in a hybrid line that has undergone subsequent cycles of sexual reproduction, the process of concerted evolution may homogenize copy types but sometimes favors one parental type over the other [30]. In the case of F1 hybrids, concerted evolution could not have occurred by unequal crossing over, and two copy types, corresponding to the two parental species, might be detectable. However, some degree of concerted evolution may have occurred by gene conversion in non-F1 material. Nuclear DNA sequences, such as ITS, are also subject to recombination and, following a number of generations, individual repeats of the ITS sequence cannot only vary from each other, but can also become highly heterogeneous themselves [31]. The repeat units can, therefore, become a mosaic of nucleotides from both, parental types, such that the original types are not easily distinguished [68]. We found that many ITS PCR products could not be sequenced without molecular cloning, which supports the evidence for considerable sequence heterogeneity within some individuals. However, the heterogeneity also presented an opportunity to study hybridization because different repeat types could be assigned to different parents of putative hybrid taxa (T. ×media and T. ×hunnewelliana).
Sequence heterogeneity was detected in the uncloned ITS sequences of T. ×media and T. ×hunnewelliana. These putative hybrids had previously been determined from morphological studies Collins et al. [1]. Polymorphisms were detected at several nucleotide sites and these can be mapped to the corresponding bases in their parental sequences ( Figure 5) or cloned sequences from the same amplicon. This provides evidence in support of their hybrid status. Taxus ×hunnewelliana is a hybrid of T. cuspidata × T. canadensis and T. ×media is a hybrid of T. baccata × T. cuspidata. This finding supports other work, especially by Collins et al. [1], but needs to be interpreted carefully given the lack of resolution in the T. baccata, T. canadensis and T. cuspidata group. Thus, given our findings and those of Collins et al. [1], it seems likely that hybridization and introgression is common among these three species and that they are not entirely distinct from each other.

Conclusions
We recognize nine Taxus species, based on our results and the findings of others (T. baccata, T. brevifolia, T. canadensis, T. cuspidata, T. floridana, T. fauna, T. globosa, T. sumatrana and T. wallichiana) have determined that some broad species groupings are consistent with their historical biogeography. We have also found evidence of considerable evolutionary reticulation that complicates the taxonomic understanding of the group. A broad phylogenetic framework of Taxus has, therefore, been provided to help guide comparative biological studies on the genus. For example, an accurate phylogeny is required to understand the evolution of the genes determining paclitaxel production (syn. Taxol), a chemical used for the treatment of ovarian, breast and lung cancer [69][70][71]. The results are also important from a conservation perspective and horticultural trade where clear taxonomic understanding is required [20,[72][73][74][75].
Supplementary Materials: The following are available online at http://www.mdpi.com/2311-7524/6/1/19/s1, Figure S1: Bayesian inference phylogenetic reconstruction of Taxaceae based on trnL and trnL-F plastid DNA Figure S2: Maximum likelihood phylogenetic reconstruction of Taxaceae based on trnL and trnL-F plastid DNA, Figure S3: Bayesian analysis of trnL-F data including Taxus and Pseudotaxus only, Figure S4: ML bootstrap consensus tree of nrITS sequence data for Taxus and Pseudotaxus only, Figure S5: Bayesian analysis of nrITS sequence data for Taxus and Pseudotaxus only, Figure S6: trnL intron and trnL-F haplotype TCS analysis, Figure S7: Leaf anatomy of Taxus. Table S1a,b: Sequences used from GenBank for a) nrITS and b) trnL-F, Table S2: Leaf anatomy data for Taxus species.