Complete Chloroplast Genome of Four Thai Native Dioscorea Species: Structural, Comparative and Phylogenetic Analyses

The chloroplast genomes of Dioscorea brevipetiolata, D. depauperata, D. glabra, and D. pyrifolia are 153,370–153,503 bp in size. A total of 113 genes were predicted, including 79 protein-coding sequences (CDS), 30 tRNA, and four rRNA genes. The overall GC content for all four species was 37%. Only mono-, di-, and trinucleotides were present in the genome. Genes adjacent to the junction borders were similar in all species analyzed. Eight distinct indel variations were detected in the chloroplast genome alignment of 24 Dioscorea species. At a cut-off point of Pi = 0.03, a sliding window analysis based on 25 chloroplast genome sequences of Dioscorea species revealed three highly variable regions, which included three CDS (trnC, ycf1, and rpl32), as well as an intergenic spacer region, ndhF-rpl32. A phylogenetic tree based on the complete chloroplast genome sequence displayed an almost fully resolved relationship in Dioscorea. However, D. brevipetiolata, D. depauperata, and D. glabra were clustered together with D. alata, while D. pyrifolia was closely related to D. aspersa. As Dioscorea is a diverse genus, genome data generated in this study may contribute to a better understanding of the genetic identity of these species, which would be useful for future taxonomic work of Dioscorea.


Introduction
Dioscorea L. is the largest genus in Dioscoreaceae, containing approximately 600 recorded species, widely distributed in the Southeast Asia, Africa, Central America, South America, and other tropical and subtropical regions [1][2][3]. Members of Dioscorea are generally known as yams, an important vegetatively-reproducing tuber crop that is a good subsistence starch crop [4,5]. While many Dioscorea species are part of a staple diet in many countries, some of them are non-edible, as they contain toxic compounds [6]. Among them, many are identified as good natural resources with medicinal properties [7][8][9][10]. However, due to it being a diverse genus, identification and classification of Dioscorea species has been a challenge to taxonomists; the genus is dioecious, has small flowers, and comes with great morphological variations [11,12].
To shed light on the taxonomic status of the species in this complicated genus via molecular approaches, several phylogenetic studies have been carried out using DNA fingerprinting techniques, such as amplified fragment length polymorphism [13], polymerase chain reaction-restriction fragment length polymorphism [14], random amplified polymorphic DNA [15], and simple sequence repeat [16], as well as the use of short gene loci derived from nuclear DNA, Pgi [17] and Xdh [18], and chloroplast DNA (cpDNA), atpB-rbcL, psaAycf 3, rbcL, rpl32-trnL, matK, trnH-psbA, and trnL-trnF [11,12,[19][20][21]. Although molecular markers provide some information on the taxonomy of Dioscorea, phylogenetic analyses are low resolution due to the limited data. Further studies to find high resolution molecular markers at the species level which lead to successful identification and phylogeny in the genus Dioscorea, are necessary [22]. Furthermore, the effort to perform molecular identification of Dioscorea species has been on-going [23][24][25]. Eventually, a study that utilized the highly variable regions in the cp genome of Dioscorea was proposed as a potentially useful marker for species delimitation and species identification among members of the complicated genus [3]. Despite the fact that studies on DNA barcoding in Dioscorea have been carried out to evaluate a suitable DNA barcode to discriminate the closely related species, the findings were inconclusive-only a limited number of samples were included in the study [26,27]. Note that Dioscorea is a diverse genus; thus, the work to barcode all species could be tedious and costly. For that reason, it is wise to look into informative sites in the cp genomes to aid in the barcoding effort of Dioscorea.
The rapid development of next-generation sequencing (NGS) platforms and bioinformatics tools in the last two decades has allowed the assembly and characterization of long sequences into complete organellar genomes to be conducted with ease [28,29]. In general, the chloroplast (cp) genome in angiosperms consists of a typical quadripartite structure, containing a pair of inverted repeats (IRs) that are separated by large single-copy (LSC) and small single-copy (SSC) regions [30]. The cp genome is generally maternally inherited, and has a genome size between 120 k and 170 k bp in length [22]. Its low rates of nucleotide substitutions and recombination make it suitable for use in phylogenetic studies of higher plants, thus resolving the complex evolutionary relationships in complicated genera [31,32]. On the other hand, complete cp genome sequences also enable researchers to understand various biological disciplines in plants, including gene families and functions, genome structure and evolution, phylogenomic relationship, etc. [33,34].
Using cpDNA is much preferred by researchers in phylogenetic studies, as demonstrated in Dioscorea; yet, studies have shown that the complete cp genome could increase phylogenetic resolution at low taxonomic levels when compared to the use of short gene sequences [22,35,36]. Owing to the need to reveal the phylogenetic relationships in Dioscorea at cp genome level, so far approximately 55 cp genome sequences, representing 35 Dioscorea species, have been made available in the NCBI GenBank database (as of January 2023). Despite the relevant amount of cp genomes that have been sequenced, the number of cp genomes reported for Dioscorea species was still less than 10% of the total species recorded in the genus. To expand the genetic information of Dioscorea, in this study, we sequenced anew and assembled the cp genome of four Dioscorea species that are native to Thailand. The assembled cp genome sequences of D. brevipetiolata, D. depauperata, D. glabra, and D. pyrifolia were characterized, and comparison analyses were conducted between the four species and other closely-related species. As a potential source of medicinal properties, we also identified several highly variable regions in the cp genome that could be developed into DNA markers. Phylogenomic analyses were also carried out to reveal the molecular placement of these species at cp genome level.

Next-Generation Sequencing, Genome Assembly, and Gene Annotation
A 350 bp paired-end library was prepared using a TruSeq DNA Sample Prep Kit (Illumina, San Diego, CA, USA) to obtain 150 bp pair-end reads. Next-generation sequencing was performed on the four collected species with an Illumina NovaSeq platform (Illumina, USA). The NGS QC Toolkit was used to trim off the adapter sequences [37] and the plastid genome was visualized using OrganellaGenomeDRAW v.1.3.1 [38]. The assembled cp genome was annotated, and the inverted repeat junctions were identified using GeSeq v.2.03 [39]. The circular cp genome was visualized using OrganellaGenomeDRAW v.1.3.1. The four Dioscorea species sequences of the annotated cp genome were deposited in the NCBI GenBank database under the accession numbers OL638495-OL638498.

Large Repeats and Simple Sequence Repeats (SSRs) Analysis
Large repeats, including the forward, palindromic, reverse, and complement repeats, were identified using REPuter [40], in which the minimum repeat size was set at 30 bp and the Hamming distance was set at 3. The SSRs present in the cp genome were identified using MISA-web [41]. The minimum number of repeat parameters were set at 10, 6, 5, 5, 5, and 5 for mono-, di-, tri-, tetra-, penta-, and hexanucleotide motifs, respectively.  [42] and the genes adjacent to them were identified. To ensure consistency in the annotation of gene content, the 25 downloaded cp genome sequences of Dioscorea were reannotated using GeSeq v2.03 [39] prior to junction analysis. Interspecific variation of the 25 species of Dioscorea at the cp genome level, including the four obtained from this study, was analyzed using mVISTA [43,44] with the Shuffle-LAGAN mode [45]. The cp genome of D. bulbifera was selected as the reference genome (Supplementary Materials, Table S1). Nucleotide diversity (Pi) in the LSC, SSC, and IR regions of the 25 species of Dioscorea was estimated using DnaSP v.6 [46]. The window length was set at 1000 bp, and 500 bp was selected for step size. The numbers of polymorphic sites and parsimony informative sites were also calculated.

Phylogenetic Reconstruction
Phylogenetic analysis was carried out based on the complete cp genome sequences of 37 species of Dioscoreaceae. Ten species-Burmania coelestis, B. cryptopetala, and B. disticha of Burmaniaceae, Diocoreales, as well as Croomia heterosepala, C. japonica, C. pauciflora, Stamonia japonica, S. mairei, S. tuberosa, and S. sessilifolia of Stemonaceae, Pandanales-were included as outgroups (Supplementary Materials, Table S1). All sequences were prepared by MEGA-X [47]. Multiple sequence alignment was performed using MAFFT v.7 [48] and phylogenetic trees were reconstructed based on two methods, maximum likelihood (ML) [49] and Bayesian inference (BI) [50]. The maximum likelihood was constructed using RAxML-HPC2 on XSEDE using a generalized-time-reversible (GTR) model with gamma (+G), and 1000 bootstrap replications were selected; for BI, the BI tree was constructed using MrBayes on XSEDE v.3.2.7a. A Markov chain Monte Carlo (MCMC) analysis was run for two million generations (Ngen = 2,000,000), with trees sampled every 100 generations. Both the ML and BI analyses were conducted using the pipelines available in the Cyberinfrastructure for Phylogenetic Research (CIPRES) Science Gateway v.3.3 [51]. Resulting trees were visualized using FigTree version 1.4.4 [52].

Repeat Sequences and SSR Analysis
A total of 90 large repeats were detected in four cp genome sequences, of which 11-14 were palindromic repeats and 9-11 were forward repeats. One large reverse repeat was identified, which was derived from D. glabra. The repeat length that was most abundant was 30-40 bp in length, followed by the length 41-50 bp. The repeat length that was recorded the least was 51-60 bp, of which only one was found in D. pyrifolia (Figure 2; Supplementary Materials, Table S4).

IR Expansion and Contraction
There were four boundaries located between the LSC-IR and SSC-IR regions in all 25 cp genomes. In general, the genes adjacent to the boundaries were similar in all cp genomes analyzed ( Figure 3). For the junction between the LSC and IRB regions (JLB), the rps19 gene was found crossing over from the IRB region into the LSC region for all species, except for D. zingiberensis; the rps19 gene of D. zingiberensis was placed in the LSC region and was 48 bp away from the boundary. On the other hand, the trnH genes, which were adjacent to JLB, were located in the IRB region in all species analyzed. For the junction between the SSC and IRB regions (JSB), two genes, trnN and ycf1, were placed next to the boundary. The trnN gene was located in the IRB region, while ycf1 was identified crossing over from the IRB region into the SSC region for all species analyzed. For the junction between the SSC and IRA regions (JSA), trnN was found intact in the IRA region, while the ndhF gene that was located in the SSC region was found crossing over  The SSR analysis of the four studied Dioscorea species revealed three SSRs: mono-, di-, and trinucleotides. Mononucleotides was the most-observed type in all four studied Dioscorea species, with A and T present, while C and G were absent. A type was found the most in D. depauperata and D. glabra at 19 SSRs, followed by D. brevipetiolata at 17 repeats and D. pyrifolia at 16 repeats. T type was found the most in D. brevipetiolata at 21 SSRs, followed by D. pyrifolia, D. depauperata, and D. glabra at 20, 16, and 16 SSRs, respectively. For dinucleotides, there was only TA in D. brevipetiolata with two SSRs, with D. depauperata, D. glabra, and D. pyrifolia at one SSR each. Concerning trinucleotides, there were ATA and TAT with one SSR in all four studied Dioscorea species (Figure 2; Supplementary Materials, Table S5).

IR Expansion and Contraction
There were four boundaries located between the LSC-IR and SSC-IR regions in all 25 cp genomes. In general, the genes adjacent to the boundaries were similar in all cp genomes analyzed (Figure 3). For the junction between the LSC and IRB regions (JLB), the rps19 gene was found crossing over from the IRB region into the LSC region for all species, except for D. zingiberensis; the rps19 gene of D. zingiberensis was placed in the LSC region and was 48 bp away from the boundary. On the other hand, the trnH genes, which were adjacent to JLB, were located in the IRB region in all species analyzed. For the junction between the SSC and IRB regions (JSB), two genes, trnN and ycf 1, were placed next to the boundary. The trnN gene was located in the IRB region, while ycf 1 was identified crossing over from the IRB region into the SSC region for all species analyzed. For the junction between the SSC and IRA regions (JSA), trnN was found intact in the IRA region, while the ndhF gene that was located in the SSC region was found crossing over JSA in the cp genomes of 10

Genomes Sequence Divergence among Dioscorea Species
Genome comparison was analyzed in 25 Dioscorea cp genomes, including the four studied species and the 21 Dioscorea species derived from the NCBI database, with D. bulbifera for reference. The results indicated that the IR regions were more highly conserved than the LSC and SSC regions, with variations located on LSC and SSC. Eight variation gaps were observed in the cp genomes alignment; namely, psbA (black arrow, A), trnK-UUU through trnQ-UUG (black arrow, B), trnS-GCU through trnG-UCC (black arrow, C), trnT-UGU through trnL-UAA (black arrow, D), accD through psaI (black arrow, E), psbE through petL (black arrow, F), petD (black arrow, G), and ccsA-trnL-UAG-rpl32-ndhF (black arrow, H). Variation gaps of trnK-UUU through trnQ-UUG, and trnS-GCU through trnG-UCC, were found in all Dioscorea cp genomes. Nine Dioscorea cp genomes had variation gaps at psbA and trnT-UGU through the trnL-UAA regions. Four Dioscorea cp genomes, D. collettii, D. quinquelobata, D. villosa, and D. zingiberensis had nucleotide divergence gaps at the accD through psaI regions. Sixteen Dioscorea cp genomes had variation gaps at psbE through petL region, while nucleotide divergence gaps in petD were found only in D. esculenta. Three Dioscorea cp genomes, D. colletii, D. quinquelobata, and D. villosa, had distinct gaps in the ccsA-trnL-UAG-rpl32-ndhF region. These regions had more than

Genomes Sequence Divergence among Dioscorea Species
Genome comparison was analyzed in 25 Dioscorea cp genomes, including the four studied species and the 21 Dioscorea species derived from the NCBI database, with D. bulbifera for reference. The results indicated that the IR regions were more highly conserved than the LSC and SSC regions, with variations located on LSC and SSC. Eight variation gaps were observed in the cp genomes alignment; namely, psbA (black arrow, A), trnK-UUU through trnQ-UUG (black arrow, B), trnS-GCU through trnG-UCC (black arrow, C), trnT-UGU through trnL-UAA (black arrow, D), accD through psaI (black arrow, E), psbE through petL (black arrow, F), petD (black arrow, G), and ccsA-trnL-UAG-rpl32-ndhF (black arrow, H). Variation gaps of trnK-UUU through trnQ-UUG, and trnS-GCU through trnG-UCC, were found in all Dioscorea cp genomes. Nine Dioscorea cp genomes had variation gaps at psbA and trnT-UGU through the trnL-UAA regions. Four Dioscorea cp genomes, D. collettii, D. quinquelobata, D. villosa, and D. zingiberensis had nucleotide divergence gaps at the accD through psaI regions. Sixteen Dioscorea cp genomes had variation gaps at psbE through petL region, while nucleotide divergence gaps in petD were found only in D. esculenta. Three Dioscorea cp genomes, D. colletii, D. quinquelobata, and D. villosa, had distinct gaps in the ccsA-trnL-UAG-rpl32-ndhF region. These regions had more than 50% different nucleotide sequences from D. bulbifera, which was used for reference ( Figure 4). Nucleotide diversity via sliding window analysis of the 25 cp genomes were compared in the LSC, IR, and SSC regions. Nucleotide variation was higher in the LSC and SSC than the IR regions, as IR regions have low nucleotide diversity. There were three highly nucleotide-divergent regions, called mutational hotspots, located in the LSC (A) and SSC (B, C) regions, showing a Pi value of >0.03 ( Figure 5; Supplementary Materials, Table S6). The first hotspot, A, covered the whole trnC-GCA gene; the second hotspot, B, was located on the ycf 1 gene; while the third hotspot, C, consisted of the rpl32 gene and the intergenic spacer region ndhF-rpl32.
Genes 2023, 14, 703 9 of 17 50% different nucleotide sequences from D. bulbifera, which was used for reference ( Figure  4). Nucleotide diversity via sliding window analysis of the 25 cp genomes were compared in the LSC, IR, and SSC regions. Nucleotide variation was higher in the LSC and SSC than the IR regions, as IR regions have low nucleotide diversity. There were three highly nucleotide-divergent regions, called mutational hotspots, located in the LSC (A) and SSC (B, C) regions, showing a Pi value of >0.03 ( Figure 5; Supplementary Materials, Table S6). The first hotspot, A, covered the whole trnC-GCA gene; the second hotspot, B, was located on the ycf1 gene; while the third hotspot, C, consisted of the rpl32 gene and the intergenic spacer region ndhF-rpl32.  light-blue bars represent tRNA and rRNA regions; gray arrows above the aligned sequences indicate the genes and their orientations; the x-axis represents the number of bases in aligned sequences; the y-axis represents the percent identity within 50-100%; black arrows indicate regions which have a crucial divergence in variations located on LSC and SSC. Region with high variation include psbA (black arrow, A), trnK-UUU-trnQ-UUG (black arrow, B), trnS-GCU-trnG-UCC (black arrow, C), trnT-UGU-trnL-UAA (black arrow, D), accD-psaI (black arrow, E), psbE-petL (black arrow, F), petD (black arrow, G), and ccsA-trnL-UAG-rpl32-ndhF (black arrow, H).

Figure 5.
Nucleotide diversity (Pi) comparing the cp genome sequences of the 25 Dioscorea species using sliding window analysis (window length, 1000 bp; step size, 500 bp); the x-axis indicates the position of the midpoint; the y-axis indicates the nucleotide diversity of each window.

Phylogenetic Analysis
As both the ML and BI trees displayed similar topology, only the ML tree is shown ( Figure 6). Based on the phylogenetic analysis reconstructed using the complete cp genome sequences, a completely resolved phylogenetic relationship was recorded among species of Dioscorea for the ML tree, but not for the BI tree. Divergence is considered reliable when the bootstrap support (BS) value is equal to or more than 75%, while the posterior probability (PP) value is equal to or more than 0.90, as indicated on the branch node.

Phylogenetic Analysis
As both the ML and BI trees displayed similar topology, only the ML tree is shown ( Figure 6). Based on the phylogenetic analysis reconstructed using the complete cp genome sequences, a completely resolved phylogenetic relationship was recorded among species of Dioscorea for the ML tree, but not for the BI tree. Divergence is considered reliable when the bootstrap support (BS) value is equal to or more than 75%, while the posterior probability (PP) value is equal to or more than 0.90, as indicated on the branch node. By placing the seven Pandanales taxa as an outgroup, in Dioscoreales, the Dioscorea clade was sister to the Burmannia + Tacca + Trichopus clade. In the Dioscorea clade, two distinct groups can be observed-one of the groups contains five species, including D. collettii, D. futchauensis, D. quinquelobata, D. villosa, and D. zingiberensis, while all the other species were placed in the other group. A moderate PP value (PP = 0.76) was observed on the branch of the BI tree between the D. futschauensis + D. quinquelobata clade and D. zingiberensis. However, this branch was supported by the ML tree, in which a BS value of 77% was recorded. Based on current circumscription, Dioscorea exhibited a monophyletic relationship. A distinct divergence was recorded at the root of the Dioscorea clade, of which five species, including D. collettii, D. futschauensis, D. quinquelobate, D. villosa, and D. zingiberensis, formed a group that was separated from the other members of Dioscorea. For the four species of Dioscorea used in this study, D. depauperata was closely related to D. glabra, and they were clustered with two other species, where D. alata was first to diverge, followed by D. brevipetiolata. D. pyrifolia was closely related to D. aspersa, and both of them formed a group with D. persimilis. futchauensis, D. quinquelobata, D. villosa, and D. zingiberensis, while all the other species were placed in the other group. A moderate PP value (PP = 0.76) was observed on the branch of the BI tree between the D. futschauensis+D. quinquelobata clade and D. zingiberensis. However, this branch was supported by the ML tree, in which a BS value of 77% was recorded. Based on current circumscription, Dioscorea exhibited a monophyletic relationship. A distinct divergence was recorded at the root of the Dioscorea clade, of which five species, including D. collettii, D. futschauensis, D. quinquelobate, D. villosa, and D. zingiberensis, formed a group that was separated from the other members of Dioscorea. For the four species of Dioscorea used in this study, D. depauperata was closely related to D. glabra, and they were clustered with two other species, where D. alata was first to diverge, followed by D. brevipetiolata. D. pyrifolia was closely related to D. aspersa, and both of them formed a group with D. persimilis.

Discussion
In this study, the cp genomes of four Dioscorea species that are native to Thailand were sequenced and assembled, and a comprehensive comparative analysis of these cp genomes was performed using other published cp genomes of the same genus obtained from NCBI GenBank. The cp genome sizes and characteristics of the four studied Dioscorea species, D. brevipetiolata, D. depauperata, D. glabra, and D. pyrifolia, are within a range that is similar to other reported cp genomes of Dioscorea, for which the complete cp genome sequence length is between 152,039 bp (D. burkilliana; GenBank no. MG805605) and 155,406 bp (D. rotundata; GenBank no. KJ490011). Within Dioscoreaceae, members of Tacca (GenBank nos. KX171420 and KT719235) have a larger cp genome size when compared to Dioscorea, which is approximately 163,000 bp, while the cp genome size of Trichopus zeylanicus subsp. travancoricus (GenBank no. MK674169) was 153,497, which is similar to that of Dioscorea. The repeat sequences found in the cp genome are products of the rearrangement and recombination of sequences in the cp genome [53]. Long repeat sequences play a role in inducing indels and identifying mutational hotspots [54], while SSRs are potentially useful in the characterization of closely-related species, as well as genetic differentiation at an intraspecific level, due to their high variability and reproducibility [55]. Based on our findings, we were unable to identify any patterns that could correlate the cp genome size and structure with the number of repeat sequences found. On the other hand, the finding from the IR border analysis somehow suggested that chloroplast genome evolution in Dioscorea seems to be highly conserved; the sequence length of the IR regions was similar, between 25,213 bp (D. schimperiana; GenBank no. MG805614) and 25,591 bp (D. collettii; GenBank no. KY996495). The expansion and contraction of the IR region allowed the movement of several genes adjacent to the junctions, including the rps19 and ndhF genes, to cross into the neighboring region. Although expansion and contraction of the IR region are common in the plant cp genome, they can differ in some degree [56]. Yet, the movement of genes crossing over the border in Dioscorea seems to not be drastic, suggesting that the evolution of the IR region in Dioscorea could be in its beginning stage.
Based on the finding from mVISTA, similar results of divergent regions have been previously reported in Dioscorea cp genomes, including ndhF, ycf 1, trnK-trnQ, trnS-trnG, trnC-petN, trnE-trnT, petG-trnW-trnP, and trnL-rpl32 [22]. Moreover, the divergent regions include trnK-trnQ, trnS-trnG, trnC-petN, trnE-trnT, petG-trnW-trnP, and trnL-rpl32, where previous reports found that these divergent regions were mostly present in the SSC and LSC regions and showed a trend toward more rapid evolution [22,[57][58][59]. With that in mind, DNA markers in the form of indels and nucleotide repeats could also be explored for species discrimination of Dioscorea. For example, two indel markers were developed from the complete cp genomes of six Ipomoea species [60], and five species-specific indel markers were developed to authenticate five species of Panax [61]. With at least eight different variable regions found in the alignment of the 25 cp genome sequences, based on mVISTA, as well as hundreds of repeats identified in the cp genome of Dioscorea, with several species of Dioscorea as important resources in traditional medicine production [62], novel indel and repeat markers could be developed to aid in species identification and authentication of these important species.
In a previous work, Zhao et al. [22] identified eight highly variable regions from a sliding window analysis of the cp genome sequences of nine species of Dioscorea. Among these eight highly variable regions, the ycf 1 gene was also reported in our work, but the regions trnC, rpl32, and ndhF-rpl32, reported in our study, are new information. The difference in the discovery of novel hotspot regions may be due to the number of cp genome sequences used during the analysis; Zhao et al. [22] utilized nine species of Dioscorea, while 25 species of Dioscorea are included in this study. Altogether there is no study that evaluates the minimum cp genome sequences that should be included in a sliding window analysis to ensure high accuracy in hotspot detection, taxon sampling from eight to ten is recommended in search of a specific barcode [63]. Yet, an increase in taxon sampling may improve the accuracy of sequence alignment, which will further affect the information of variable sites delivered [64]. Therefore, we do not exclude the possibility that the hotspot regions identified in our study might be superior to those proposed by Zhao et al. [22] in terms of phylogenetic resolution at the species level. However, further experiments to verify the discrimination strength of these regions are required.
To our knowledge, this is the first work on phylogenetic tree reconstruction of Dioscorea that involved 31 different species, based on the complete cp genome sequence. Evidently, the use of the complete cp genome sequence in phylogenetic tree reconstruction of complicated genera has been recommended by many researchers, as it could yield promising results [65,66]. For example, the molecular placement of D. aspersa, D. glabra, and D. persimilis was ambiguous when using five cp and two mitochondrial DNA sequences [67], but was resolved in this study. In the same study, the phylogenetic tree, reconstructed using 48 Dioscorea taxa, revealed similar topology when compared to the phylogenetic tree based on the complete cp genome sequences. The divergence of the five species in our study complimented the grouping of the taxa from the section Stenophora [67]. The section Stenophora is recognized as the most basal clade in the phylogeny of Dioscorea [68], while the genus was proposed with more than 23 sections, with differing opinions being put forward. Nonetheless, a fully resolved phylogenetic tree was obtained in this study; it is recommended that an acceptable sample size ought to be achieved prior to phylogenetic reconstruction for taxonomic classification purposes. Although there is literature proposing the use of the complete cp genome sequence as super-barcodes that are effective in delimiting closely related species [69], performing NGS on a large number of samples might not be favorable to some laboratories due to sequencing cost and availability of sequencing facilities. Thus, identifying a powerful DNA region that is adequate for phylogenetic analysis of Dioscorea, as suggested in the previous paragraph on the DNA barcoding of Dioscorea, is deemed requisite.

Conclusions
The genomic data generated in this study can be potentially useful for the authentication of Dioscorea species, and can be further developed into powerful species-specific markers of Dioscorea species, using both subtle details and the overall cp genome. Additionally, beyond reducing the necessary research time, funding, and the number of plant species studied, the findings from the phylogenetic analysis of Dioscorea based on the complete cp genome sequences have provided much insight into the molecular placement and phylogenetic relationship among the members of Dioscorea used in this study. Further taxonomic classification of Dioscorea should also consider the use of this NGS dataset for reconstruction of phylogenetic trees at the genome level, to aid in combing out the taxonomic uncertainties among these complicated species.