Complete Chloroplast Genome of Paphiopedilum delenatii and Phylogenetic Relationships among Orchidaceae

Paphiopedilum delenatii is a native orchid of Vietnam with highly attractive floral traits. Unfortunately, it is now listed as a critically endangered species with a few hundred individuals remaining in nature. In this study, we performed next-generation sequencing of P. delenatii and assembled its complete chloroplast genome. The whole chloroplast genome of P. delenatii was 160,955 bp in size, 35.6% of which was GC content, and exhibited typical quadripartite structure of plastid genomes with four distinct regions, including the large and small single-copy regions and a pair of inverted repeat regions. There were, in total, 130 genes annotated in the genome: 77 coding genes, 39 tRNA genes, 8 rRNA genes, and 6 pseudogenes. The loss of ndh genes and variation in inverted repeat (IR) boundaries as well as data of simple sequence repeats (SSRs) and divergent hotspots provided useful information for identification applications and phylogenetic studies of Paphiopedilum species. Whole chloroplast genomes could be used as an effective super barcode for species identification or for developing other identification markers, which subsequently serves the conservation of Paphiopedilum species.


Introduction
The sequence of chloroplast (cp) genomes can provide information for studying genetic relationships, gene transfer, cloning, and domestication of species [1]. Much research has shown the role of plastomes in practice. The project of chloroplast genome sequencing and barcoding Fritillaria [2] helped identify Fritillaria species for medical use. The sequencing of the chloroplast genome of the rice variety Nagina-22 [3] enriched genetic resources to support the breeding and crossbreeding of next-generation rice varieties. Na Tian et al. (2018) sequenced and analyzed the chloroplast genome of Epipremum aureum that contributed to the propagation and support of gene transfer of this medicinal plant [4]. Shuai Guo et al. (2018) successfully sequenced the chloroplast genome of Paeonia ostii which enhanced the productivity of this medicinal herb [5]. In orchids, the sequencing and analysis of cp genomes helped explain phylogenetic relationships and the evolutionary path of Orchidaceae [6,7]. genome sequence was of P. delenatii. This complete plastome sequence was deposited in GenBank under accession MK463585.
The assembled chloroplast genome is 160,955 base pairs (bp) in length ( Figure 1). It exhibits a typical quadripartite structure of the large single-copy (LSC, 89,869 bp) and small single-copy (SSC, 2,694 bp) regions, separated by a pair of inverted repeat regions (IRs, 34,196 bp each). There are 107 unique genes that were annotated (Table 1), including 68 protein-coding genes, 30 transfer RNA genes, 4 ribosomal RNA genes, and 5 pseudogenes. In particular, there are 23 genes in double copies (Table 1).

Figure 1.
Gene map of the Paphiopedilum delenatii chloroplast genome. Annotated genes are colored according to functional categories. Genes lying outside of the outer circle were transcribed clockwise, while those inside the circle were transcribed counterclockwise. The innermost, darker gray corresponds to GC content, while the lighter gray corresponds to AT content. IR, inverted repeat; LSC, large single-copy region; SSC, small single-copy region; SSU, small subunit; LSU, large subunit; RNA, ribonucleic acid; NADH, nicotinamide adenine dinucleotide (NAD) + hydrogen (H); ATP, Adenosine Triphosphate; RubisCO, Ribulose-1,5-bisphosphate carboxylase/oxygenase. Figure 1. Gene map of the Paphiopedilum delenatii chloroplast genome. Annotated genes are colored according to functional categories. Genes lying outside of the outer circle were transcribed clockwise, while those inside the circle were transcribed counterclockwise. The innermost, darker gray corresponds to GC content, while the lighter gray corresponds to AT content. IR, inverted repeat; LSC, large single-copy region; SSC, small single-copy region; SSU, small subunit; LSU, large subunit; RNA, ribonucleic acid; NADH, nicotinamide adenine dinucleotide (NAD) + hydrogen (H); ATP, Adenosine Triphosphate; RubisCO, Ribulose-1,5-bisphosphate carboxylase/oxygenase. The overall GC content of P. delenatii is 35.6%. The GC contents of the LSC, SSC, and IR regions are 33.0%, 28.5%, and 39.3%, respectively ( Table 2). The GC content in the IR region was higher than both LSC and SSC regions in all examined Paphiopedilum plastomes. This result agreed with previous studies [6,7,26,27]. The presence of four ribosomal RNA (rRNA) genes is considered to be the reason for high GC contents in the IR regions [6,26]. However, another hypothesis proposed that the higher GC content evolution in IR regions does not relate to natural selection but to GC-biased gene conversion (gBGC) [7,27]. Accordingly, GC to AT mutations were unstable while AT to GC ones were fixed after gBGC, and hence the gBGC process prefers repairing DNA mismatches in Plants 2020, 9, 61 4 of 17 recombining DNA over evolutionary time [28]. Meanwhile, IRs are considered to be recombination hotspots due to their identical inverted repeat structures, which increase the frequency of intraplastomic recombination [29,30] and hence increase gBGC.  GC content has been reported to be different not only between different regions of a genome, but also between different genomes of different species [31,32]. Overall GC content of some orchid plastomes were recorded, e.g., Dendrobium moniliforme (37.54%), Goodyera schlechtendaliana (37.07%), Vanilla aphylla (35.02%) [33], Cremastra appendiculata (37.2%), Calanthe davidii (36.9%), Epipactis mairei (37.2%), and Platanthera japonica (37%) [6], with their plastome lengths ranging from 148,778 to 162,835 bp. GC content of the Paphiopedilum species in our study range from 35% to 35.6%, with genome lengths of 154.699 to 162.682 bp, similar to other orchids. Hence, Paphiopedilum and Vanilla contain lower a GC percent, which has been suggested to mean that the sequence variability was higher toward the enrichment of AT nucleotides [33,34]. The difference in GC content of the nuclear genome has been proposed as a useful value for identification of species [31]. Karimi et al. (2018) introduced the GCSpeciesSorter tool for accurately and quickly determining GC content and, hence, classifying species in a mix of DNA relationships for metagenomic studies [35]. Hence more studies on the GC content of plant plastomes might provide a useful measurement for the identification of species.
The comparison of four Paphiopedilum plastomes showed that P. delenatii inherits a similar conserved plastome structure to its Paphiopedilum sisters ( Table 2). Some differences are that P. niveum has the least total gene number and coding sequence (CDS) number and has no protein-coding gene inf A. P. dianthum contains four genes, ycf 68(× 2) and orf 42(× 2), that are not observed in the other three plastomes. P. delenatii is distinguished by double copies of trnL_UAG and rpl32 (Table S1).
In P. delenatii, there are six ndh genes, all of which are pseudogenes: ndhB(× 2), ndhC, ndhD, ndhJ, and ndhK. ndhA, ndhE, ndhF, ndhG, ndhH, and ndhI are entirely absent from the P. delenatii cp genomes. The same applies to P. armeniacum and P. niveum chloroplast genomes. As for P. dianthum, the RefSeq annotation showed two pseudogenes: ndhB and ndhD. This result was consistent with the report of Guo et al. (2012) [36]. However, we aligned the ndhC, ndhJ, and ndhK sequences of P. delenatii to the three chloroplast genomes of Paphiopedilum and found a version of ndhJ present in the P. dianthum cp genome ( Figure S1).
Ndh genes code for the enzyme NADH dehydrogenase [37,38] that is responsible for electron transport of chloroplasts. The loss and variation of ndh genes in IR boundaries in orchid species have been the focus of multiple investigations [23,39,40]. Here, we compared the boundaries between two inverted repeat regions, i.e IRa and IRb, and LSC and SSC regions of P. delenatii to those of 14 other orchid species from 5 subfamilies of Orchidaceae ( Figure 2). All Cypripedioideae species with high-resolution plastid genome annotations were included. P. delenatii exhibited a highly-conserved pattern of IR boundaries to those of P. dianthum, P. armeniacum, as well as Vanilla poompona (Vanilloideae): rpl22 at the IRb/LSC border, 2 copies of the ycf1 gene in IRs, and a pseudogenized rpl22 next to the border of IRa/LSC. These 4 species possessed rather small SSCs (2037-3666 bp) compared to the remaining 11 species (13,066-21,921 bp). Interestingly, in the other four Cypripedioideae species, which were Phragmipedium longifolium, Cypripedium formosanum, Cypripedium japonicum, and Cypripedium macranthos, ycf1 genes shifted to the borders of IRs and SSC, leaving one functional ycf1 at the SSC/IRa junction and one pseudogene in IRb, next to the IRb/SSC. The three Cypripedium species had the ndhF gene in the SSC region, which was absent in all Paphiopedilum. These Cypripedioideae had very similar features in IR junctions to those of Orchidoideae and Apostasioideae. In Epidendroideae, IR boundaries of the 4 examined species exhibited higher heterogeneity: ndhF is present in Calanthe davidii and Neottia ovata, but not in Eulophia zollingerri and Cattleya crispate; ycf1 is mostly in the SSC region but at the IRb/SSC border in Neottia ovata. In the case of P. niveum, the IR regions, which contain six palindromic pairs instead of one (Table S2), were not determined. Hence, this species was not included in Figure 2 and its plastome structure should be interpreted further.
Orchidoideae and Apostasioideae. In Epidendroideae, IR boundaries of the 4 examined species exhibited higher heterogeneity: ndhF is present in Calanthe davidii and Neottia ovata, but not in Eulophia zollingerri and Cattleya crispate; ycf1 is mostly in the SSC region but at the IRb/SSC border in Neottia ovata. In the case of P. niveum, the IR regions, which contain six palindromic pairs instead of one (Table S2), were not determined. Hence, this species was not included in Figure 2 and its plastome structure should be interpreted further. The IR region is highly conserved and stable in the chloroplast genome. The expansion and contraction of the IR region is a common characteristic of the chloroplast genome. Luo et al. (2014) proposed four types of IR/SSC junction when they examined seven orchids [41]. In our study, the IR pattern of Cypripedium and Orchidoideae and Apostasioideae species matched with type I; Phragmipedium longifolium matched with type III; Eulophia zollingery and Cattleya crispata matched with type IV. None of our examined species matched with Type II, which was the overlap of Ψycf1 and ndhF genes [41]. Instead, Vanilla pompona, Calanthe davidii, and Neottia ovata expressed entirely new IR patterns ( Figure 2). Together with the shift of the ycf1 gene, the presence or absence of the ndhF gene was one of the factors observed in different IR/SSC patterns. According to Guo et al. [36], among the Cypripedioideae, Cypripedium species inherited the ndhF gene from their ancestors; this gene has been lost from the other genera. ndhF gene loss was previously proposed to be correlated The IR region is highly conserved and stable in the chloroplast genome. The expansion and contraction of the IR region is a common characteristic of the chloroplast genome. Luo et al. (2014) proposed four types of IR/SSC junction when they examined seven orchids [41]. In our study, the IR pattern of Cypripedium and Orchidoideae and Apostasioideae species matched with type I; Phragmipedium longifolium matched with type III; Eulophia zollingery and Cattleya crispata matched with type IV. None of our examined species matched with Type II, which was the overlap of Ψycf 1 and ndhF genes [41]. Instead, Vanilla pompona, Calanthe davidii, and Neottia ovata expressed entirely new IR patterns ( Figure 2). Together with the shift of the ycf 1 gene, the presence or absence of the ndhF gene was one of the factors observed in different IR/SSC patterns. According to Guo et al. [36], among the Cypripedioideae, Cypripedium species inherited the ndhF gene from their ancestors; this gene has been lost from the other genera. ndhF gene loss was previously proposed to be correlated with the instability of IR/SSC boundary [40]. More studies are required to better understand IR evolution. However, this study emphasized the diversity of IR/SSC boundaries in orchids.

Repeat and Microsatellite Analysis
For Paphiopedilum delenatii, we found 645 repeats with lengths from 30 to 58 bp (Table S3). The number of forward repeats were the most common (176/645), followed by palindromic repeats (168/645), reverse repeats (167/645), and complement repeats (133/645). Most repeats were inside intergenic spacers. Notably, 23 repeat sequences were entirely located in the ycf2 gene. There were 87 simple sequence repeats (SSRs) or microsatellite sequences identified in the P. delenatii chloroplast  (Table S4). Fifty SSR loci were in intergenic regions, while 21 were in the coding areas (12 of which were in ycf1 gene). Most SSR loci were mononucleotide repeats with AT motifs (57/87).
Repeats are units of DNA that are similar in the genome. There are short repeats and longer repeats. SSR (simple sequence repeat), or microsatellite, is a type of low-complexity, short repeat with 1-6 nucleotides. Generally, microsatellite SSRs are widely distributed throughout the genome and have a great effect on recombination and rearrangement of the genome [42,43]. In our study, there were no tetranucleotide or longer repeats in the cp genome. This result was consistent with previous reports that most SSRs include mono-and di-nucleotide repeats while tri-, tetra-, penta-, and hexa-nucleotide repeat sequences were detected at much lower frequencies [6]. The longer repeat, known as a minisatellite, contains 10-100 nucleotides. In term of direction and complementary, these repeats are divided into four types: forward (direct) repeats, reverse repeats (also known as inverted repeat -IR), complement repeats, and palindromic repeats (reverse complement repeat) [44]. Both microsatellite and minisatellite repetitive sequences play significant roles in species identification. SSRs were used as DNA barcodes to clear identify 5 genotypes of Solanum melongena L. by Chinnappareddy et al. [45]. In Orchidaceae, SSR markers were developed and utilized as identification tools in various studies [18,[46][47][48] due to their high reproducibility and variability [45,49,50]. In particular, SSRs in the chloroplast genome were reported to have a high level of polymorphism among species and loci [51]. Sets of cp SSRs were also isolated and developed for recognizing valuable plants, serving conservation genetics, investigation of chloroplast genetic structure, adaptive evolution, and population [52][53][54][55][56][57]. Even single loci, i.e., trnL and trnL-F, in chloroplast were used to develop specific primers for amplifying SSR sequences for a population genetic study [58]. The longer repeat minisatellites were supposed to have specific mechanisms of evolution and function, forming common, as well as unique, repeat patterns [44,59], hence they are important tools for taxonomic and phylogenetic studies. Eight out of 13 species in the Phoenix genus (Arecaceae) were unambiguously distinguished using minisatellites developed from a 700 bp region in the chloroplast spacer trnG (GCC)-trnfM (CAU) [60]. Our study provided primary data of SSRs and minisatellite repeats for further research on identifying applications.

Phylogenetic and Species Resolution Analyses
We included chloroplast genomes of P. niveum and all the Orchidaceae species examined in the above IR boundaries section to understand the phylogenetic relationship between P. delenatii and other species across subfamilies ( Figure 3). Artemisia argyi was set as the outgroup. Firstly, a whole-plastome tree was constructed. As expected, P. delenatii along with other Paphiopedilum again showed a high similarity and close phylogenetic relationship to each other. The 6 species of Cypripedioideae were clustered together into one clade, excluding C. macranthos and C. formosanum. Although the IR boundary analysis showed similar patterns between V. pompona and those of Paphiopedilum accessions, in term of the whole plastome, V. pompona was more similar to Epidendroideae and Orchidoideae. Next, we constructed phylogenetic trees based on two popular short barcodes: matK and rbcL. All three phylogenetic trees presented a full separation of 16 species. This might be because the analysis consisted of distinct species. In contrast to the plastome tree, Apostasia wallichii was now segregated in an independent monophyletic branch, as expected, since it belongs to a different subfamily. Furthermore, C. formosanum was grouped with the members of its genus. A dot plot similar analysis on the whole-genome alignment of P. delenatii to other species ( Figure S2) found that C. formosanum and A. wallichii both had an inverse similarity fragments located in the LSC region, at the position of 10,000-80,000 bp. This led to greater genetic similarities between C. formosanum and A. wallichii, which grouped them into one clade in the whole-plastome tree. C. macranthos was closer to its sisters of Cyprideoideae in terms of nucleotide polymorphism ( Figures S3 and S4), but differed in terms of amino acid variations, which explains the separation of this species in the phylogenetic trees.
C. formosanum and A. wallichii both had an inverse similarity fragments located in the LSC region, at the position of 10,000-80,000 bp. This led to greater genetic similarities between C. formosanum and A. wallichii, which grouped them into one clade in the whole-plastome tree. C. macranthos was closer to its sisters of Cyprideoideae in terms of nucleotide polymorphism ( Figure S3-S4), but differed in terms of amino acid variations, which explains the separation of this species in the phylogenetic trees. The chloroplast genome was proposed to be used as meta-barcode by some previous studies [61][62][63][64]. Hence, two universal mini-barcodes, matK and rbcL, were established in order to compare the effects of species resolution under the tree-based method. From the above analyses, using the whole genome for phylogenetic relationship analysis might not be practical due to the unexpected problem of inversion. However, for barcoding, the species separation, but not relationship, is of first concern. In this respect, using the whole plastome as a super-barcode is effective. Besides, we also analyzed genetic distance matrices among species using plastome, matK and rbcL, data independently (Table S5). The average, minimum, and maximum of distances were calculated. Except for rbcL with a low average value (0.034), the average genetic distance by matK (0.122) and by plastome (0.112) were all high. Although the average values were not much different between plastomic and matK data, the minimum value of matK (0.005) was much lower than that of the plastome (0.012). While some previous studies showed that matK was not able to identify all species in some close taxonomic groups [65,66], our result suggests that the entire cp genome could do better than matK in identifying closely related species. The chloroplast genome was proposed to be used as meta-barcode by some previous studies [61][62][63][64]. Hence, two universal mini-barcodes, matK and rbcL, were established in order to compare the effects of species resolution under the tree-based method. From the above analyses, using the whole genome for phylogenetic relationship analysis might not be practical due to the unexpected problem of inversion. However, for barcoding, the species separation, but not relationship, is of first concern. In this respect, using the whole plastome as a super-barcode is effective. Besides, we also analyzed genetic distance matrices among species using plastome, matK and rbcL, data independently (Table S5). The average, minimum, and maximum of distances were calculated. Except for rbcL with a low average value (0.034), the average genetic distance by matK (0.122) and by plastome (0.112) were all high. Although the average values were not much different between plastomic and matK data, the minimum value of matK (0.005) was much lower than that of the plastome (0.012). While some previous studies showed that matK was not able to identify all species in some close taxonomic groups [65,66], our result suggests that the entire cp genome could do better than matK in identifying closely related species.

Divergence of Hotspot Regions
We calculated the nucleotide variability in the chloroplast genomes of the four analyzed species: Paphiopedilum delenatii, Paphiopedilum armeniacum, Paphiopedilum niveum, and Paphiopedilum dianthum. The P. delenatii plastome was highly conserved compared to other cp genomes of Paphiopedilum species. The number of single nucleotide polymorphism (SNP) was 24,211 out of 170,423 bp of the alignment. The values of nucleotide diversity (pi) ranged from 0 to 0.34889 and the diversity threshold was 0.079. However, the diversity threshold decreased at 0.0377 when the SSC regions were excluded from the analysis. At this threshold, 11 highly-variable regions were suggested as potential markers in species identification and phylogeny study of the Paphiopedilum genus (Figure 4). These 11 highly-variable regions included 1 protein-coding gene (clpP) and 8 intergenic spacers (matK-rps16, trnR_UCU-atpA, psbM-trnD_GUC, trnE_UUC-trnT_GGU, accD-psaI, psbE-petL, trnP_UGG-psaJ, and rpl23-trnL_CAU) in the LSC region; 1 intergenic spacer (ycf 1-rps15) in the repeat region IRb; and 1 intergenic spacer (ccsA-psaC) in the SSC region (Table S6).

Divergence of Hotspot Regions
We calculated the nucleotide variability in the chloroplast genomes of the four analyzed species: Paphiopedilum delenatii, Paphiopedilum armeniacum, Paphiopedilum niveum, and Paphiopedilum dianthum. The P. delenatii plastome was highly conserved compared to other cp genomes of Paphiopedilum species. The number of single nucleotide polymorphism (SNP) was 24,211 out of 170,423 bp of the alignment. The values of nucleotide diversity (pi) ranged from 0 to 0.34889 and the diversity threshold was 0.079. However, the diversity threshold decreased at 0.0377 when the SSC regions were excluded from the analysis. At this threshold, 11 highly-variable regions were suggested as potential markers in species identification and phylogeny study of the Paphiopedilum genus (Figure 4). These 11 highly-variable regions included 1 protein-coding gene (clpP) and 8 intergenic spacers (matK-rps16, trnR_UCU-atpA, psbM-trnD_GUC, trnE_UUC-trnT_GGU, accD-psaI, psbE-petL, trnP_UGG-psaJ, and rpl23-trnL_CAU) in the LSC region; 1 intergenic spacer (ycf1-rps15) in the repeat region IRb; and 1 intergenic spacer (ccsA-psaC) in the SSC region (Table S6). In the study of Bi et al. (2018), nucleotide diversity of eight Fritillaria species ranged from 0 to 0.02583 [67]. The pi value was from 0 to 0.05872 for the nine cp genomes of Eragrostis species in the study of Somaratne et al. (2019) [68]. In comparison with those studies, the pi value (0-0.34889) in our study was much higher. The reason was that there were two fragments with extremely high divergence within the SSC region that were not present in the mentioned studies. Therefore, we tried another analysis in which the SSC region was removed. The threshold was significantly decreased, at 0.0377 instead of 0.079, for the whole cp genomes. The high divergence of SSC was also reported recently by Cui et al. (2019). A comparison of 10 ginger species showed an average value of nucleotide variability of 0.0187, while it was only 0.0075 when comparing 4 species of the same family [69]. Therefore, SSC seemed to rapidly evolve compared to LSC. The two inverted regions, IRa and IRb, were quite conserved, with low nucleotide diversity. In addition, the narrow endemism is also a prominent feature of Paphiopedilum species. Of the known species, 72% are narrowly endemic with very limited distribution [70]. The local distribution and ecological separation might be the reason for considerable genetic differences between species in the same genus and might clarify the high nucleotide diversity of Paphiopedilum species in our study.
Using short sequences for the identification of a species is still the current universal method due to its simple and time-saving traits. Indeed, the average divergence of matK (0.122) was even higher than that of the plastome (0.112) in our study. The reason for this is that the plastome contains both diverse and conserved regions while matK is a selected, high-variable locus [71]. This result proved In the study of Bi et al. (2018), nucleotide diversity of eight Fritillaria species ranged from 0 to 0.02583 [67]. The pi value was from 0 to 0.05872 for the nine cp genomes of Eragrostis species in the study of Somaratne et al. (2019) [68]. In comparison with those studies, the pi value (0-0.34889) in our study was much higher. The reason was that there were two fragments with extremely high divergence within the SSC region that were not present in the mentioned studies. Therefore, we tried another analysis in which the SSC region was removed. The threshold was significantly decreased, at 0.0377 instead of 0.079, for the whole cp genomes. The high divergence of SSC was also reported recently by Cui et al. (2019). A comparison of 10 ginger species showed an average value of nucleotide variability of 0.0187, while it was only 0.0075 when comparing 4 species of the same family [69]. Therefore, SSC seemed to rapidly evolve compared to LSC. The two inverted regions, IRa and IRb, were quite conserved, with low nucleotide diversity. In addition, the narrow endemism is also a prominent feature of Paphiopedilum species. Of the known species, 72% are narrowly endemic with very limited distribution [70]. The local distribution and ecological separation might be the reason for considerable genetic differences between species in the same genus and might clarify the high nucleotide diversity of Paphiopedilum species in our study.
Using short sequences for the identification of a species is still the current universal method due to its simple and time-saving traits. Indeed, the average divergence of matK (0.122) was even higher than that of the plastome (0.112) in our study. The reason for this is that the plastome contains both diverse and conserved regions while matK is a selected, high-variable locus [71]. This result proved the species separation capability of this mini-barcode in comparison with the super one. Common markers, such as the internal transcribed spacer (ITS) in the nucleus and cp loci rbcL, matK, ycf 1, trnL, trnL-F, atpF-atpH, trnH-psbA, etc., were the most used. However, as no single locus can resolve the entire plant species [72,73], new variable sites are still being sought [6,10,12,74,75]. In the chloroplast genome, SNPs were not random but clustered as "hotspots" [76], which were defined as highly-variable loci [77]. A series of new findings for species-specific barcodes was developed recently. Protein-coding gene clpP and intergenic spacer rps15-ycf 1 from our proposal were also recommended previously [76,[78][79][80]. The hot region of trnE-trnT was matched with the study of Zhao et al. (2018) [77]. Although rps16 and its intergenic spacer with trnQ were suggested in much research [10,11,26,74,81], our study introduced a new intergenic spacer of this gene, i.e., rps16-matK. However, searching for new variable sites as candidate barcodes is just the first step. Primer designing and amplification success are also required for barcoding effects [10,25,74].
Besides barcoding applications, hot region information is also used for developing other PCR-based identification techniques. rDNA-ITS (internal transcribed spacer) sequences were used to design species-specific SCAR (sequence characterized amplified regions) markers, in which 3 primer pairs-SCAR-600armF/Pap-ITS2R, SCAR-300delF/Pap-ITS2R, and SCAR-700micF/Pap-ITS2R-were effectively used to amplify and recognize three Paphiopedilum species (P. armeniacum, P. micranthum, and P. delenatii) and their hybrids [17]. Similarly, two DNA sequences, rpoC2 and atpF-atpH, were reported to contain species-specific SNPs, insertions, and deletions. This information was utilized to develop 3 species-specific primers for quickly identifying species: Cypripedium guttatum var. koreanum, C. japonicum, and C. formosanum of the Korean Cypripedium genus. This ARMS (amplification refractory mutation system) method was also based on electrophoresis technique without a sequencing step [15]. In another study (2014), the divergent nucleotide sequence of the ITS region and 3 cpDNA fragments were amplified and subsequently cut with several restriction enzymes to create species-specific types of DNA patterns. The PCR-restriction fragment length polymorphism (PCR-RFLP) approach was successful in the identification of 25 native Dendrobium species in Thailand [16].

Plant Material, DNA Extraction, and Sequencing
The sample of Paphiopedilum delenatii plant ( Figure 5) was identified by the shape of the flowering plant and stored at −80 • C at Tay Nguyen Biological Institute, Vietnam. First, 0.2 g of leaf was ground with 5 µL proteinase K, 3mL of a mixture of beta-mer and extract buffer at 65 • C, then incubated for 30 min at 65 • C. The sample was had 600 uL P:C:I added and was centrifuged for 10 min at 10,000 rpm. After adding 5 uL of RNAse and incubating at 37 • C, the sample had 600 uL C:I added. DNA was precipitated by isopropanol and incubated overnight at −20 • C. The pellet, obtained by centrifugation, was washed with 70%, 80%, and 90% ethanol. DNA was suspended in 25 uL TE and stored at −20 • C. The library construction and whole-genome sequencing of P. delenatii was performed by GENEWIZ (South Plainfield, NJ, USA). Sequencing was carried out on an Illumina HiSeq (Genewiz, South Plainfield, USA) using a 2 × 150 paired-end (PE) configuration.

Read Data Processing and Chloroplast Genome Assembly
Demultiplexing was performed by bcl2fastq 2.17. Raw data was filtered as follows: (1) discard pair-end reads with adapter, (2) discard pair-end reads when the content of N bases is more than 10% in either read, and (3) discard pair-end reads when the ration of bases of low quality (Q < 20) is more than 0.5 in either read. The chloroplast genome of P. delenatii was reconstructed using NOVOPlasty 2.7.2 [82], with the complete chloroplast genome of P. armeniacum (RefSeq: NC_026779.1) as the reference genome and the rbcL from the same plastid genome of P. armeniacum as the seed sequence. The annotation was done by GeSeq [83] and further manually curated by comparison to the annotations of P. armeniacum, P. dianthum, and P. niveum in GenBank. The genome map was drawn by OGDRAW [84].

Repeat Sequence and Microsatellite Identification
REPuter [85] was used to calculate DNA repeats, including forward, reverse, complement, and palindromic kinds of repeat sequences. The repeats were identified with a hamming distance of 3 and minimum repeat size of 30 [6]. MISA [86] was used to identify microsatellite sequences with default parameters.

Examination of IR Junctions
We manually examined the IR junctions of all included orchid species. Annotations of IRs, SSC, LSC, and genes were based on their respective annotations in the RefSeq database. For genomes without IR annotations, we used REPuter to identify their pairs of inverted repeats.

Phylogenetic Analysis
The phylogenetic analysis was based on the complete genome sequences of 16 orchid species under the maximum likelihood criterion and the GTR + I + G nucleotide substitution model using R package phangorn [87]. Node was calculated from 1000 bootstrap replicates. Figtree [88] was used to visualize the resulting tree. The multiple alignment data of 16 plastomes was used to calculate variable sites and genetic distance matrices using MEGA [89]. MAFFT [90] was used to pairwise align and construct dot-plot graph of these 16 plastomes.

Read Data Processing and Chloroplast Genome Assembly
Demultiplexing was performed by bcl2fastq 2.17. Raw data was filtered as follows: (1) discard pair-end reads with adapter, (2) discard pair-end reads when the content of N bases is more than 10% in either read, and (3) discard pair-end reads when the ration of bases of low quality (Q < 20) is more than 0.5 in either read. The chloroplast genome of P. delenatii was reconstructed using NOVOPlasty 2.7.2 [82], with the complete chloroplast genome of P. armeniacum (RefSeq: NC_026779.1) as the reference genome and the rbcL from the same plastid genome of P. armeniacum as the seed sequence. The annotation was done by GeSeq [83] and further manually curated by comparison to the annotations of P. armeniacum, P. dianthum, and P. niveum in GenBank. The genome map was drawn by OGDRAW [84].

Repeat Sequence and Microsatellite Identification
REPuter [85] was used to calculate DNA repeats, including forward, reverse, complement, and palindromic kinds of repeat sequences. The repeats were identified with a hamming distance of 3 and minimum repeat size of 30 [6]. MISA [86] was used to identify microsatellite sequences with default parameters.

Examination of IR Junctions
We manually examined the IR junctions of all included orchid species. Annotations of IRs, SSC, LSC, and genes were based on their respective annotations in the RefSeq database. For genomes without IR annotations, we used REPuter to identify their pairs of inverted repeats.

Phylogenetic Analysis
The phylogenetic analysis was based on the complete genome sequences of 16 orchid species under the maximum likelihood criterion and the GTR + I + G nucleotide substitution model using R package phangorn [87]. Node was calculated from 1000 bootstrap replicates. Figtree [88] was used to visualize the resulting tree. The multiple alignment data of 16 plastomes was used to calculate variable sites and genetic distance matrices using MEGA [89]. MAFFT [90] was used to pairwise align and construct dot-plot graph of these 16 plastomes.

Nucleotide Variability Calculation
DnaSP v6.1 was used to extract the parsimony variable site density over the plastid genome alignment of four analyzed Paphiopedilum species with a sliding window (window length ≤ 600 and step size = 200). Nucleotide diversity was calculated by the ratio of Pi and window length. The diversity threshold was 0.079, calculated by the sum of the average and double the standard deviation [67]. Regions with diversity higher than the threshold were recommended as highly variable regions.

Conclusions
In this study, we aimed to expand the genetic resource of the endangered species P. delenatii by next-generation sequencing and chloroplast genome assembly. P. delenatii chloroplast genome exhibited a quadripartite structure of 160,955 bp and a total of 130 genes, which were highly conserved compared to other Paphiopedilum species. All six ndh genes in the P. delenatii cp genome were pseudogenes. The presence or absence of the ndhF gene and the shift of ycf 1 and rpl22 genes on the boundaries between IRs and LSC and SSC regions resulted in different IR/SSC patterns that can be useful in inferring species relationships. A reference of 87 SSRs/minisatellite repeats in P. delenatii was proposed for further research on identification applications. Eight highly-variable regions were suggested as the potential markers in barcoding and phylogeny studies of Paphiopedilum genus. Hence, the sequence data of P. delenatii complete chloroplast genome could be used directly in the identification of Paphiopedilum species or for the development of other identification markers, such as SSR, barcoding, or species-specific PCR-based techniques. Although it is still costly and time-consuming compared to short DNA sequencing, genome sequencing costs have decreased significantly in recent years. Along with the development of other whole genome sequencing without amplification steps, e.g., Nanopore technique, we might hope that sequencing of the whole genome would be easier and convenient in the future.
Supplementary Materials: The following are available online at http://www.mdpi.com/2223-7747/9/1/61/s1, Figure S1: Alignment of ndhC, ndhJ, and ndhK genes on 4 Paphiopedilum using SimpleSynteny; Figure S2: Pairwise dot plot cladograms of P. delenatii and other Orchids; Figure S3: Nucleotide polymorphism of Orchidaceae species based on rbcL region; Figure S4: Amino acid polymorphism and number of variable sites between C. macranthos and other Orchidaceae species using rbcL sequence; Table S1: List of annotated genes in Paphiopedilum plastomes; Table S2: Positions of some palindromic repeat pairs in P. niveum; Table S3: Full list of DNA repeats in P. delenatii cp genome; Table S4: Full list of simple sequence repeats (SSRs) in P. delenatii cp genome; Table S5: Genetic distance matrix of analysis data; Table S6. Variable sites from plastomes of 4 Paphiopedilum species.  Acknowledgments: This study was supported by the Science and Technology Development Fund of Nguyen Tat Thanh University. We thank to Tay Nguyen Institute for Scientific Research for providing Paphiopedilum samples for this study. We are also very grateful to the Computational Biology Center of International University, Vietnam National University for providing the computer resources.

Conflicts of Interest:
The authors declare no conflict of interest.