Complete Chloroplast Genome Sequence and Phylogenetic Inference of the Canary Islands Dragon Tree ( Dracaena draco L.)

: Dracaena draco , which belongs to the genus Dracaena , is an endemic succulent of the Canary Islands. Although it is one of the most popular and widely grown ornamental plants in the world, little is known about its genomic variability. Next generation sequencing, especially in combination with advanced bioinformatics analysis, is a new standard in taxonomic and phylogenetic research. Therefore, in this study, the complete D. draco chloroplast genome (cp) was sequenced and analyzed in order to provide new genomic information and to elucidate phylogenetic relationships, particularly within the genus Dracaena . The D. draco chloroplast genome is 155,422 bp, total guanine ‐ cytosine (GC) content is 37.6%, and it has a typical quadripartite plastid genome structure with four separate regions, including one large single copy region of 83,942 bp length and one small single copy region of 18,472 bp length, separated by two inverted repeat regions, each 26,504 bp in length. One hundred and thirty ‐ two genes were identified, 86 of which are protein ‐ coding genes, 38 are transfer RNAs, and eight are ribosomal RNAs. Seventy ‐ seven simple sequence repeats were also detected. Comparative analysis of the sequence data of various members of Asparagales revealed mutational hotspots potentially useful for their genetic identification. Phylogenetic inference based on 16 complete chloroplast genomes of Asparagales strongly suggested that Dracaena species form one monophyletic group, and that close relationships exist between D. draco , D. cochinchinensis and D. cambodiana . This study provides new and valuable data for further taxonomic, evolutionary and phylogenetic studies within the Dracaena genus.


Introduction
The genus Dracaena Vand. ex L. (Asparagaceae, Nolinoideae) consists of about 190 species [1][2][3]. The Dracenoid clade contains species belonging to genera, which now are incorporated into the Dracaena genus, but for a long time were treated as separate genera, such as Sansevieria Petagna with succulent leaves and mesophytic Pleomele Salisbury [3,4]. The morphology of the inflorescences of these two genera is very similar, but they are different from flowers of the species of dracaenas called the dragon trees [5]. This group contains 11 species, which are distinguished by succulent leaves and production of a red resin known as dragon blood [6]. These distinctive plants, which are tertiary relicts, are of great interest due to their cultural heritage to humanity, important ecological role as umbrella species [6], and the fact that they do not form a strongly supported monophyletic clade within Dracaena sensu lato in molecular studies, despite similar morphology [1,4]. The first described species of the whole Dracaena genus and the most famous dragon tree is Dracaena draco L., which occurs in a subtropical climate in thermo-sclerophyllous zones in Morocco and Atlantic islands including the Canary Islands, Cabo Verde and Madeira. Due to its spectacular umbrella-like habit and excretion of a red resin, known as a dragon blood, it is called the Canary Islands dragon tree. Its natural distribution on the islands and the African continent was a subject of numerous studies [5,[7][8][9][10][11][12][13]. Except the typical subspecies D. draco subsp. draco from the Canary Islands, two more were described: D. draco subsp. ajgal Benabid & Cuzin from Morocco [8] and D. draco subsp. caboverdeana Marrero Rodr. & R. Almeida from Cabo Verde [12]. Most previously conducted studies on D. draco focused on its habit, growth and morphology [5,9,12,[14][15][16][17][18][19], leaf anatomy and function [20][21][22], formation and structure of vascular bundles [23][24][25] as well as root growth [26,27]. Many studies have concerned its flowers [28], pollen [29], seed propagation [13] or formation of the resin and its ecological significance [30,31]. D. draco being a plant icon of the Canary Islands, is widely cultivated as an ornamental plant and its resin has a great significance for human till today [31]. According to the International Union for Conservation of Nature, the current status of D. draco was defined as vulnerable [32].
In contrast to the works cited above, very few genetic studies have been performed on D. draco. Moreover, these studies were not very extensive and focused on a small number of DNA regions (mainly barcodes) [1,4,33,34]. So far, there are no more comprehensive genomic data on D. draco.
Next generation sequencing, especially in combination with advanced bioinformatic analysis, offers enormous opportunities and is becoming the new standard in biological research. Comparative analysis of complete chloroplast genomes (cp) not only allows advanced phylogenetic reconstruction [35,36], but also allows them to be used as barcode supercodes to distinguish and identify different taxa. This approach can be very useful especially in the study of closely related or recently divergent species [37].
The concept of using the entire chloroplast genome as a superbarcode or for reconstruction of phylogenesis is not new [38]. However, due to the decreasing costs of analysis, this approach is becoming a commonly used procedure and sets a new standard in taxonomic and phylogenetic studies, offering excellent genetic resolution. This approach also makes it possible to solve many complex taxonomic problems [39]. Furthermore, analysis of complete genomes is also recommended in studies of the genus Dracaena [40]. Recently, extensive genetic research has been conducted using next generation sequencing on six species of the genus Dracaena, mainly Asian [40]. To date, no similar genomic data have been published for D. draco. Therefore, the structure and organization of its chloroplast genome, as well as the level of genetic variation, remain unknown. In addition, phylogenetic relationships between different representatives of the genus Dracaena should include D. draco, and should also be clarified using data from high-performance techniques.
Therefore, our main aims in this study were: (1) sequencing, analyzing and characterizing the complete D. draco chloroplast genome using a next generation sequencing platform and bioinformatics tools; (2) conducting whole genome comparative analysis with published chloroplast genomes of other representatives of the genus Dracaena; (3) selecting mutational regions of chloroplast genome useful in identifying species in the order Asparagales; and (4) determining the position of D. draco within the genus based on the complete chloroplast genome sequence and phylogenetic inference.
The results obtained in this work provide a solid theoretical basis for future taxonomic and phylogenetic studies of the Dracaena genus. They also enable detailed analysis of D. draco cpDNA genetic resources and thus open the way to developing a program for the conservation of genetic resources of this important species.

Plant Material, DNA Extraction and Sequencing
Fresh and healthy leaves of Dracaena draco were collected from the Botanical Garden of the Adam Mickiewicz University in Poznań (Poland) (52º 25′N, 16º 53′E) from a specimen cultivated there since 1986 (collection number I_I005_001_0000_6986_0357) and stored at 4 °C until DNA extraction. Genomic DNA was isolated using the CTAB method [41]. The quality and integrity of isolated DNA were determined using agarose gel electrophoresis and measurement on a NanoDrop spectrophotometer (Thermo Fisher Scientific, Carlsbad, California, United States). Ion Torrent PGM libraries were made using the NEBNext Fast DNA Fragmentation & Library Prep Set for Ion Torrent (New England BioLabs), according to the manufacturer's instructions. Selection of the desired size of DNA fragments was performed by 2% agarose gel electrophoresis using the E-Gel Precast Agarose Electrophoresis System (Thermo Fisher Scientific). The library was sequenced on the Ion Torrent PGM platform located in the Laboratory of Molecular Biology, Faculty of Biology, Adam Mickiewicz University, Poznań (Poland).

Genome Assembly, Annotation and Identification of Simple Sequence Repeats
BBDuk Adapter/Quality Trimming V. 35.82 available in Geneious Prime 2020.0.49.1.7 [42] was used to filter low quality reads and trim low-quality ends and adapters. The filtered reads were de novo assembled into contigs using Geneious Assembler on default options with merging homopolymer variants. Contigs were mapped to the reference genome Dracaena cambodiana (NC_039776) using Geneious Mapper with minimum mapping quality: 30. Reads, which mapped to the reference genome, were used for the de novo assembly of the complete chloroplast genome of D. draco. The reference genome assembly database was created based on the available sequential data in the NCBI (National Center for Biotechnology Information) database for other Dracaena representatives, i.e., D. angustifolia (MN200193), D. cambodiana (MN200194), D. cochinchinensis (MN200195), D. elliptica (MN200196), D. hokouensis (MN200197) and D. terniflora (MN200198). Assembled genomes were initially annotated using CPGAVAS2, an integrated plastome sequence annotator [43], and GeSeq [44]. Location of large single copy region (LSC) and small single copy region (SSC) as well as calculation of GC content was done in Geneious Prime 2020.0.4 9.1.7 [42] by comparison with homologous sequences available to other Dracaena representatives. Transfer RNAs were also checked with tRNAscan-RE v2.0.3. [45] incorporated in GeSeq [44] using default settings. OrganellarGenomeDRAW (OGDRAW) version 1.3.1 [46] was used to draw a circular map chloroplast genome of D. draco. The complete D. draco chloroplast genome sequence has been deposited in GenBank under accession number MN990038.
Identification of divergent hotspots was performed separately for representatives of the genus Dracaena and the Asparagales order based on seven and fifteen chloroplast genomes, respectively. The relevant chloroplast genomes were aligned using MAFFT 7.450 default option (Algorithm = Auto; Scoring matrix = 200PAM/k = 2; Gap openpenalty: 1.53 and Offset value: 0.123) [49], and then nucleotide diversity (Pi) were calculated through sliding window analysis using DnaSP version 6 [50]. The window length was set to 600 bp, with a step size 200 bp. The diversity thresholds for Dracaena 0.0152 and for Asparagales 0.0511 were calculated by sum of the average and double the standard deviation [51]. Regions with a level of nucleotide diversity higher than these thresholds were recommended as highly variable regions.
Evolutionary divergence between Dracaena species was estimated by calculating genetic distances using the p-distance method in MEGA X [52] using default settings (Substitution Type: Nucleotide; Substitution to Include: d: Transitions + Transversions; Rates among Sites: Uniform Rates; Gaps/Missing Data Treatment: Complete deletion) with 1000 bootstrap replicates.

Phylogenetic Inference
Phylogenetic inference was constructed by maximum likelihood (ML) and Bayesian inference (BI) analysis using 16 complete sequences of chloroplast genomes of various Asparagales representatives (including data obtained for D. draco in this study). The list of species included in the study along with GenBank accession numbers is given in Table 1. For a better elucidation of the tree topology, both closely related taxa, i.e., seven Dracaena species, and further related taxa, i.e., Aloe vera or Yucca filamentosa, were selected as the outgroups. Complete chloroplast genomes were aligned with MAFFT 7.450 using default settings (Algorithm = Auto; Scoring matrix = 200PAM/k = 2; Gap openpenalty: 1.53 and Offset value: 0.123) [49]. A General Time Reversible + Proportion Invariation + Gamma nucleotide substitution model (GTR + I + G) were selected according to Akaike`s information criterion (AIC) [53] with MEGA X [52], as the best substitution model for the ML and BI analyses. The ML analyses were conducted in RaxML v8.2.11 [54], with 1000 rapid bootstrap replicates along with a search for the best-scoring ML tree in every run and parsimony random seed set to 10.
BI analyses were conducted using MrBayes v 3.2.6 [55,56]. The Markov Chain Monte Carlo (MCMC) algorithm was run for 100,000 generations and the trees were sampled every 100 generations. The remaining analysis parameters were set as follows: heated chains: 4 and heated chain temperature: 0.2; random seed: 23,364; unconstrained branch lengths: GammaDir (1;0,1;1;1) with shape parameter exponential: 10. The first 25% of the trees were discarded as a burn-in and remaining trees were used to generate the consensus tree, including clade posterior probability (PP). Aloe vera was used as an outgroup. Convergence was determined by examining the average standard deviation of the split frequencies (<0.01).

Genome Features
The Dracaena draco chloroplast genome ( Figure 1) has a typical quadripartite structure that is characteristic and observed in many flowering plants [35,36,57]. It contains a pair of inverted repeat regions (IRa and IRb) that comprise 26,504 bp each. The two IR regions divide the genome into a large single copy (LSC) region and a small single copy (SSC) region. The LSC region is 83,942 bp, whereas the SSC region is 18,472 bp. The complete chloroplast genome of D. draco is 155,422 bp in length (GenBank acc. MN990038). The total percentage of GC content is 37.6% and ranges from 31.2% in the SSC region, through 35.6% in the LSC region to 42.9% in the IR regions. Slightly higher GC content in IR regions is likely due to duplicate ribosomal RNA genes in these regions [58]. A total of 132 genes were annotated in the D. draco chloroplast genome (113 unique genes excluding duplicate ones), including 86 protein-coding genes, 38 transfer RNA genes, and eight ribosomal RNA genes ( Figure 1, Table 2).

No. Classification of Genes
Name of Genes Number   [40], in terms of size, length of LSC, SSC and IR regions, number of predicted genes or GC content (Table 3).
Some studies indicate that evolution phylogenetic relationships [40,59] can be analyzed based on GC content. However, in the case of Dracaena species, the GC content is at a similar level (from 37.5% to 37.6%), so it is difficult to draw far-reaching conclusions about this type of phylogenetic nature only on this basis. Simple sequence repeats (SSRs or microsatellites) are very often used in population, ecological and conservation genetics as effective molecular markers mainly due to the high level of genetic polymorphism detected by them and wide distribution throughout the whole chloroplast genome [60][61][62][63]. Initially, isolation and characterization of microsatellite markers and the development of primers amplifying these regions was quite a laborious and expensive task. Therefore, an attempt was made to design universal primers that could be used for many species on a cross-amplification basis [64,65]. Chloroplast genome sequences are generally highly conserved; hence there is a good chance that such primers amplifying repeating regions may also be used in the analysis of several other related species. Some studies support the high effectiveness of such procedures (especially in the case of chloroplast SSRs), whereas others do not (in the case of nuclear SSRs) [66,67].
In this study, a total of seventy-seven SSRs of at least 10 bp length were detected in the D. draco chloroplast genome. The number of SSRs detected in this study is similar to previously published results obtained for six other Dracaena species, in which the number of SSRs ranged from 64 for D. terniflora to 71 for D. elliptica ( Table 3). Most of identified SSRs (Table 4) had a mononucleotide motif (89.61%). There were definitely fewer repeats with a di-and trinucleotide motifs (7.79% and 2.60%, respectively).
Our results are also confirmed by the observations from other studies in which SSRs in chloroplast genomes have a motif composed mainly of short polyadenine (polyA) or polythymine (polyT) repeats and much less often contain guanine (G) or cytosine (C) tandem repeats [59]. The SSRs identified in this study can be used in the further genetic studies on D. draco in order to characterize genetic resources and geographical patterns of diversity.

Genome Comparative Analysis and Identification of Divergent Hotspots
The D. draco chloroplast genome was aligned with the chloroplast genomes of six other other  In general, whole-genome alignment of the chloroplast sequences revealed no rearrangement or inversion events among Dracaena chloroplast genomes and confirm the close evolutionary relationships between Dracaena species. However, a common break was observed in the regions (~87,000−107,000; ~130.00−150.00), characterized by the high variation in gene sequence among the aligned chloroplast genomes. Our findings are consistent with previous studies [40], which also noted high sequence homology (with a few exceptions) among Dracaena chloroplast genomes.
The p-distance values calculated as an estimator of evolutionary divergence (Table 5) Table 5. Estimates of evolutionary divergence between Dracaena species. The number of base differences per site from between sequences are shown. Standard error estimate(s) are shown above the diagonal and were obtained by a bootstrap procedure (1000 replicates). This analysis involved 7 nucleotide sequences. All positions containing gaps and missing data were eliminated (complete deletion option). There were a total of 154,129 positions in the final dataset. Evolutionary analyses were conducted in MEGA X [52]. DnaSP was used to carry out two sliding window analyses in order to identify mutational regions. One analysis concerned Dracaena species (Figure 3A), while the other concerned representatives of Asparagales ( Figure 3B). The results in Figure 3A clearly show that for Dracaena species there were six divergent hotspots with a high Pi value (>0.0152), i.e., psbI-atpA, petA-psbJ, clpP, rps12-ndhF, ndhF-ccsA and ycf1-rps12. For Asparagales, a total of ten unique mutational regions with a high Pi value (>0.0511) were detected, i.e., psbK-rps16, rpoB-petN, psbM-psbD, ndhK-atpE, petA-psbJ, ycf2-ndhB, rps12-ndhF, ndhI-ndhA, ycf1 and ndhB-ycf2 ( Figure 3B). The average value of nucleotide diversity (Pi) was 0.00420 and 0.02133 for Dracaena and Asparagales, respectively. In summary, the Pi value was over five times higher in Asparagales than in Dracaena. This result is in line with expectations because the analysis included more distant species representing 10 genera and two families, not just seven closely related species from the genus Dracaena. However, a previous study on Asian Dracaena species also indicated a low level of genetic polymorphism [40]. The chloroplast DNA regions selected in this study can be used as specific barcodes dedicated for further taxonomic studies of Asparagales members. A species-specific barcode is defined as a fragment of DNA sequence that has a sufficiently high mutation rate to enable species identification within a given taxonomic group [38]. The ycf1 region seems particularly interesting in this respect for Asparagales. Several studies show that this region has a high discriminatory power and much more potential than universal barcodes, including for species other than Dracaena [37,68,69].

Phylogenetic Inference
Chloroplast genome sequences have been successfully used to study the phylogeny of many different plant groups [35,36,40,57]. In this study, we were particularly interested in the position of D. draco within the genus Dracaena. Therefore, phylogenetic trees were constructed using ML and Bayesian algorithms and nucleotide sequences of chloroplast genomes of sixteen Asparagales representatives. As shown in Figure 4, both obtained ML and Bayesian phylogenetic trees clearly indicated that all species representing the Dracaena genus formed a separate cluster within Asparagales. However, this cluster consists of two groups, confirmed by a very high probability.
One on them includes D. draco obtained in this study and, originating from one node, a sister clade composed of D. cochinchinensis and D. cambodiana. The second comprises D. hokouensis, D. elliptica and the sister clade of D. terniflora and D. angustifolia. The second major group also consists of two smaller clades with representatives of six other genera: Convallaria, Liriope, Maianthemum, Nolina, Polygonum and Rohdea. The first sister clade, which includes Convallaria keiskei and Rohdea chinensis, is characteristic of both trees. The other species form different sister groups depending on the method. In the Bayesian tree Liriope spicata groups in a sister clade with Nolina atopocarpa, while in the ML tree L. spicata is sister to Maianthemum bicolor and N. atopocarpa is sister to Polygonatum enophyllum. It is worth noting that the probability of a correct tree solution is higher in the Bayesian method than in ML. However, to discuss the phylogenetic relationship of these genera, a larger number of representatives is needed. Two species of the Asparagaceae family, i.e., Asparagus officinalis (genus Asparagus) and Yucca filamentosa (genus Yucca), were outside the two main distinguished groups both in ML and Bayesian trees. Our results are consistent with previous phylogenetic studies of the Dracaena genus, both those based on complete chloroplast genomes but not including D. draco [40] and those based on selected cpDNA regions in which D. draco was included [34]. Our analyses increase the knowledge of Dracaena's phylogeny and provide valuable genetic information for future research into the evolutionary history of this group.

Conclusions
In this study, we reported and analyzed the complete chloroplast genome of D. draco. The structure of the chloroplast genome, its organization, length as well as the order and number of genes are similar to those recently published for six Asian Dracaena species. Due to the high level of polymorphism and its length, the ycf1 region appears to be very useful for identifying taxa within Asparagales, though not necessarily of the genus Dracaena. Conducted phylogenetic analyses revealed that D. draco is much closer to D. cochinchinensis and D. cambodiana than to D. terniflora, D. angustifolia, D. hokouensis and D. elliptica.