Comparative Analyses of Chloroplast Genomes Provide Comprehensive Insights into the Adaptive Evolution of Paphiopedilum (Orchidaceae)

: An elucidation of how the selection pressures caused by habitat environments affect plant plastid genomes and lead to the adaptive evolution of plants, is a very intense area of research in evolutionary biology. The genus Paphiopedilum is a predominant group of orchids that includes over 66 species with high horticultural and ornamental value. However, owing to the destructive exploitation and habitat deterioration of wild germplasm resources of Paphiopedilum , it needs more molecular genetic resources and studies on this genus. The chloroplast is cytoplasmically inherited and often used in evolutionary studies. Thus, for this study, we newly sequenced, assembled and annotated ﬁve chloroplast genomes of the Paphiopedilum species. The size of these genomes ranged from 155,886 bp ( P. henryanum ) to 160,503 bp ( P. ‘GZSLKY’ Youyou ) and they contained 121–122 genes, which consisted of 76 protein coding genes, eight ribosomal RNAs, and 37–38 transfer RNAs. Combined with the other 14 Paphiopedilum species, the characteristics of the repeat sequences, divergent hotspot regions, and the condo usage bias were evaluated and identiﬁed, respectively. The gene transfer analysis showed that some fragments of the ndh and ycf gene families were shared by both the chloroplast and nucleus. Although the genomic structure and gene content was conserved, there was a signiﬁcant boundary shift caused by the inverted repeat (IR) expansion and small single copy (SSC) contraction. The lower GC content and loss of ndh genes could be the result of adaptive evolutionary responses to its unique habitats. The genes under positive selection, including accD, matK, psbM, rpl20, rps12, ycf1, and ycf2 might be regarded as potential candidate genes for further study, which signiﬁcantly contribute to the adaptive evolution of Paphiopedilum .


Introduction
During the evolutionary processes of certain plants, changing environments and habitats may impose selective genetic pressures, which leave a footprint of natural selection in the genes involved with environmental adaptation [1].As essential organelles, chloroplast (cp) genomes derived from the endosymbiosis between non-photosynthetic hosts and independent living cyanobacteria play an indispensable role in several vital biochemical processes and the photosynthesis of green plants [2].Over the last decade, advanced modeling has increasingly been employed for resolving the deep phylogeny, phytogeography, as well as the molecular evolution and adaptive diversification of plants with the features of haploid inheritance, self-replication, relatively small size and slow mutation rates compared with the nuclear genome [1][2][3].
In general, the cp genome has a conservative circle structure, gene order, and genetic content in most flowering plants.It is typically comprised of two inverted repeat copies (IRa and IRb), a large single copy region (LSC), and a small single copy region (SSC) [3][4][5].However, although the cp genome being much more conservative than nuclear genomes or mitochondrial genomes, the gene rearrangements, inversion, gene loss, inverted repeats expansion still occur in some green plant lineages.Moreover, there were many mutation events have been found in cp genomes, which are including insertions, deletions, substitutions, and inversions [6].These disparities might be precisely the evidence or embodiment of adaptive evolution, which are considered as the improvement of plant species to adapt the environmental conditions changing and during their evolutionary processes.
Nevertheless, the courses, tempos, and modes of evolutionary and genomic architectural changes in recent speciation background are still not clear, due to the microevolution of cp genomes and genomes genes in flowering plants remaining largely unknown [2].Thus, in-depth comparisons of cp genomes in closely related plant species are urgently required to significantly improve the evolutionary inferences sensitivity with genomic structural knowledge towards a thorough elucidation of the mechanisms, rates, and directionality of cp genome evolution.
The genus Paphiopedilum Pfitz.belongs to the Orchidaceae family, which includes over 66 species, of which 18 are natively distributed as ornamental plants that span Southwest to South China due to their large and exquisite flowers [7][8][9][10][11].Paphiopedilum are often referred to as slipper orchids as they possess a bag-like lip that is akin to ladies' slippers [9][10][11].They have significant ornamental, horticultural, and medicinal value and grow primarily within the cracks in cliffs, or rocky well-drained sites in evergreen broadleaved forests, at altitudes of ~1000 m as a type of terrestrial, lithophytic/epiphytic herb [12][13][14].Their unique ecosystems make them ideal for the study of environmental adaptation and evolution [9][10][11].
Unfortunately, owing to the destructive exploitation and habitat deterioration of wild Paphiopedilum germplasm resources, it has been listed in the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES) and is prohibited from ruthless collection and international exploitation [8,15,16].Thus far, many species of Paphiopedilum have been designated as first-class protected plants in China, as well as being listed on the IUCN Red List of Endangered (EN) even Critically Endangered (CR) Species [15][16][17].
In past years, research into Paphiopedilum has focused primarily on ecological characteristics and cultivation [18][19][20], genetic diversity and differentiation based on DNA molecular markers (e.g., ITS and plastid gene fragments) [21,22], and historical biogeography [14,23].Recently, there were some reports also focused on comparative analysis, phylogeny, and evolution based on the whole chloroplast genomes [9][10][11][12].With the rapid development of the Next-Generation Sequencing (NGS) technologies, many cp genomes of the Orchidaceae family along with the Paphiopedilum species were available in the public database [9][10][11][12][23][24][25][26][27].These assembled cp genomes offered useful genetic data for Paphiopedilum phylogenetic and evolution studies [9][10][11][12]28,29].However, the phylogenetic relationships and evolution of Paphiopedilum are quite complex [12,21,22,30].Investigations into the evolution and adaption of this plant have rarely been reported.The lack of genomewide investigations limits the capacity to identify genomic characteristics that are under natural selection.Consequently, additional molecular and genetic resources are urgently required to provide a genetic basis for comparative cp genomes, phylogeny, evolutionary biology, and effective protection strategies for Paphiopedilum.
Moreover, the chromosomal karyotypes of different Paphiopedilum species are heterogeneous such that variable numbers of alleles present obstacles to genetic research based on nuclear genes [31][32][33].Hence, in this case, the efficient and convenient cp genome which, with many advantages, has become an ideal tool for the genetic and evolutionary analysis of Paphiopedilum.Additionally, according to previous studies, the cp genomes of Paphiopedilum might provide unique opportunities to reveal the boundary shift impacts on cp genome structures and gene evolution due to their exceptional peculiarities [12,34].Owing to the large number of species of the Orchidaceae family in the world, cp genomes have been widely used in their phylogenetics, evolutionary biology, and population genetics [9][10][11][12][13][14][24][25][26][27].
Thus, in this study, we performed comparative analyses to afford comprehensive insights into the evolution and adaptation of the cp genomes of several Paphiopedilum species.Firstly, in addition to a previous study, we sequenced and assembled five cp genomes of Paphiopedilum species.Secondly, we conducted comparative cp genome analyses for these five genomes in addition to the other fourteen chloroplast genomic sequences of Paphiopedilum, which were originally distributed in China [8] and obtained from GenBank (Table S1).
Subsequently, we reconstructed a phylogeny of Paphiopedilum using the cp genomes of forty-one Paphiopedilum individuals combined with four species of a closely related genus and two Lilium species as outgroups (Table S1).Finally, we performed selective pressures to survey whether the protein coding genes were under negative (purifying) selection or positive selection.This study not only provides beneficial knowledge and resources for the conservation and utilization of one of the most world-wide cultivated plant germplasms, but also offers clues into adaptive evolution for the future investigation of epiphytic herbs.

Sampling and Sequencing
In this study, we collected samples in accordance with the laws of the People's Republic of China.The field works were approved by the Chinese Government.To collect the samples, all researchers also received the permission letters from the College of Life Science, Northwest University.We identified the samples based on their phenotype.To represent the Paphiopedilum, four natural species (P.bellatulum, P. barbigerum, P. henryanum, and P. hirsutissimum) were selected based on their morphological characteristics.Further, a new hybrid Paphiopedilum cultivar (P.'GZSLKY' Youyou) with high ornamental and horticultural value created by crossing female P. dianthum and male P. barbigerum parents, was also investigated.All five voucher specimens were deposited in the herbarium of the College of Life Science, Northwest University (No: 20210105030-20210105034).
The fresh young healthy leaves were placed into plastic bags filled with silica gel, and were quickly dehydrated and stored at 20 • C for later use.The total genomic DNA was extracted using CTAB methods [35].Then we verified the DNA quality and quantity using a spectrophotometer and 1% agarose gel electrophoresis, respectively.The genome sequencing was implemented utilizing the Illumina HiSeq 2500 platform with an average insert size of 350 bp.

Genome Assembly and Annotation
The clean data were generated following low-quality data and adaptor filtering from the raw data using FASTP software [36].Subsequently, BAW [37] was adopted to align the clean reads to complete the cp genome of P. purpuratum (NCBI accession number MN535015) [38], which were selected as the reference.Then, the aligned reads were selected for further de novo assembly via GETORGANELLE [39], where the range of k-mer was set to 21, 45, 65, 85, 105 and max-rounds to 15, which was then manually corrected.The assembled cp genomes were annotated using the CPGAVAS online tool [40] with default settings.To improve annotation accuracy, multiple cp genomes of the Paphiopedilum species including P. armeniacum, P. delenatii, P. dianthum, P. micranthum, P. niveum, P. purpuratum and P. spicerianum were adopted as references, whereas the transfer RNAs (tRNAs) were confirmed by TRNASCAN-SE [41].Following annotation, the intron and exon boundaries, as well as the start and stop codons were manually corrected according to the rule of group II introns.Finally, the circle gene physical maps of the complete cp genome sequences were visualized using the CHLOROPLOT online tool [42].
We submitted each of the assembled and annotated cp genome sequences to NCBI Gen-Bank, which were designated via the following GenBank accession numbers: MN315106 for P. barbigerum, MN315107 for P. bellatulum, MN315108 for P. henryanum, MN315109 for P. hirsutissimum, and MN315105 for P. 'GZSLKY' Youyou, respectively (Table 1).An additional fourteen Paphiopedilum chloroplast genome sequences, obtained from GenBank and primarily and originally distributed across China, were also adopted for further comparative analyses (Table S1).

Genome Sequence Characteristic Analysis
Further to the annotation results obtained by CPGAVAS [40], BLASTN was employed to confirm the LSC, IRs, and SSC ranges, respectively, through self-comparison.EM-BOSS [43] was utilized to extract the sequences of each segment, whereas the lengths and GC contents were evaluated for both the whole cp genomes and each region using SE-QKIT [44].The gene compositions and categories were counted according to the annotation results, as well as the lengths of exons and introns for the splitting genes.The copy numbers of the protein coding genes (PCGs) were calculated by the THYLOSUITE software and were then checked manually.

Comparative Genomic Analysis
To compare the structures and conservation of the studied cp genomes (Table 1 and Table S1), MVISTA [45] was employed based on two alignment models: LAGAN and SHUFFLE-LAGAN.The former produces true multiple alignments though they included inversions, while the latter can detect sequence rearrangements and inversions.We analyzed the expansion and contraction of IRs (IRa and IRb) using the IRscope online program [46], and then manually modified them.
To estimate the differentiation hotspot regions in Paphiopedilum, the whole cp genomes of the selected species were aligned using MAFFT [47], followed by the calculation of the nucleotide diversity with DNASP [48], in which the length of the sliding window and step size were both set to 500 bp.To test the effectiveness and the variability of the identified hotspot regions, which were further used for the constructed the ML tree.
The codon usage frequency in Paphiopedilum was analyzed for protein-coding genes (PCGs) using MEGA [49].The relative synonymous codon usage (RSCU) was assessed to determine whether the cp genomes were under selection.Specifically, when the RSCU value was >1.00, this meant that the use of a codon was more frequent than expected, and vice versa.
For predicting the synteny between genomes from different sources, the cp genome was aligned to the nuclear genome of the closely related orchid species Dendrobium chrysotoxum, which was selected as a reference using BLASTN.The E value was set to 10-5 and the identity was set to 99%, where the genes located in the synteny regions were scanned according to the annotation results.Finally, the synteny plot was constructed using TBTOOLS [50].

Repeat Sequence and SSR Element Analyses
The SSRs (simple sequence repeats or microsatellites) were detected using ISA [51] with the minimum repeat number set at ten, five, four, three, three, and three for mono-, di-, tri-, tetra-, penta-, and hexa-repeat units, respectively.The maximum sequence lengths between two SSRs to register as a compound SSR were set to zero.The sizes and locations of interspersed repeat sequences, including forward, reverse, complement, and palindromic repeats in Paphiopedilum, were identified using REPUTER [52].The repeat positions were identified according to the parameters established under the following conditions: (1) sequence identity > 90%; (2) minimum repeat size ≥ 30 bp; and (3) Hamming distance of 3. Furthermore, tandem repeat sequences were detected using TANDEM REPEATS FINDER [53] based on the advanced model.The alignment parameters were set to 2, 7 and 7, which corresponded to match, mismatch and indels, respectively, and the maximum period size was set to 500.

Phylogenetic Analyses
To determine the phylogenetic relationships and status of Paphiopedilum species adopted in study, forty-seven cp genomes, including forty-one individuals of Paphiopedilum, two individuals of Phragmipedium, two individuals of Cypripedium and two individuals of Lilium species (Table S1), were employed to create a Maximum Likelihood (ML) tree.The accession number of these cp genomes, which were used for phylogenetic analyses, are shown in Table S1.Firstly, the complete cp genome sequence was extracted using THYLOSUITE software [54] and aligned via MAFFT [47] with default parameter settings, after which the aligned sequences were pruned using GBLOCKS [55].Subsequently, ML analysis was performed using IQ-TREE [56] with 1000 non-parametric bootstrap replicates in which GTR+G+I+G4 was adopted as the best-fit model according to the Bayesian Information Criterion (BIC) score.Finally, ITOL software was employed to visualize and refine the tree [57].

Selective Pressure Estimation
The 61 shared signal-copy PCGs across 19 Paphiopedilum species were extracted using THYLOSUIT followed by alignment in MAFFT with a codon model.The alignment files were converted and formatted using script AXTCONVERTOR.The YN00 script incorporated in KAKS_CALCULATOR software [58] was employed to calculate the synonymous (Ks), nonsynonymous (Ka), and Ka/Ks using the method of Yang and Nielsen (YN) [59], and Fisher's exact test was performed to validate the Ka and Ks values.Following screening, several protein coding genes (PCGs) were discarded if either the Ka or Ks values were unavailable prior to subsequent analysis.Finally, PCGs with Ka/Ks of >1 were regarded as candidate genes under positive selection [1].
To evaluate the variability and selectivity within a specific species, the different cp genome resource generated from different individuals within a given species were employed for the comparison analysis.The species which contained two or more available cp genome announcements in NCBI were adopted in this analysis.The complete cp genome sequences were aligned using MAFFT and the DNASP was adopted to detect the polymorphic sites (SNPs) as well as the insertions and deletions (InDels).Additionally, the KAKS_CALCULATOR program was used to estimate the selection pressure at the species level with the YN method and Fisher's exact test.

Length and Features of Cp Genome
For this study, the structural characteristics and gene contents of five newly assembled Paphiopedilum cp genomes were determined.The complete chloroplast genome sequences of five newly Paphiopedilum accessions were obtained after performing the de novo and reference-guided assembly with minor modifications.Akin to other angiosperm studies, they included a typical quadripartite structure that was comprised of an LSC domain, and an SSC domain separated by two inverted repeat (IRa and IRb) regions (Figure 1).The cp genome sizes in the five Paphiopedilum species ranged from 155,886 bp (P.henryanum) to 160,503 bp (P.'GZSLKY' Youyou).The gene GC content is plotted in the second circle, depicted in dark blue.The symbol "*" indicated that the pseudogenes in these chloroplast genomes.
There were 76 PCGs and eight rRNAs genes annotated in the five Paphiopedilum cp genomes.The number of tRNAs genes were slightly different for these five cp genomes.Specifically, there were 38 tRNAs in P. barbigerum, P. bellatulum, and P. 'GZSLKY' Youyou, respectively, while the P. henryanum and P. hirsutissimum contained only 37 tRNAs (Table 1).A total of 16 genes containing introns were observed in the studied species, of which 14 contained one intron (trnV-UAC, trnA-UGC, trnK-UUU, trnL-UAA, trnI-GAU, trnG-UCC, rps12, rps16, rpl16, rpl2, rpoC1, petD, petB and atpF), whereas two genes (clpP and ycf3) contained two introns.A total of 20 duplicated genes were detected in the IR regions, including eight tRNAs (trnH-GUG, trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA- The LSC of the five cp genomes ranged from 56.18% to 57.06% of their entirety, and ranged in size from 87,573 bp (P.henryanum) to 91,582 bp (P.'GZSLKY' Youyou).The SSC of the five cp genomes accounted for 1.82-2.34% of their entirety, and it ranged from 2831 bp (P.henryanum) to 3666 bp (P.hirsutissimum).Furthermore, P. 'GZSLKY' Youyou possessed the longest IRs (32,851 bp), while the P. barbigerum contained the smallest IRs (in terms of length) at 32,309 bp.Moreover, the length of the LSC region was correlated with the total genome size significantly (r = 0.983, p < 0.01).The average GC content of the cp genomes in the five Paphiopedilum species was similar and ranged from 35.71% to 36.17%.However, the GC content of both the LSC (r = 0.962, p < 0.01) and SSC (r = 0.932, p = 0.021) was positively associated with the total GC content.The distribution patterns of the GC content of the IRs were also significantly different with those in both the LSC (F = 15, p < 0.01) and SSC (F = 10, p = 0.45), respectively.

Comparative Cp Genomic Analysis
To perform a comparison of Paphiopedilum cp genomes, the gene structures of the five Paphiopedilum species were drafted with the other 14 Paphiopedilum species obtained from the NCBI (Table S1).The results of structural variation analysis performed by MVISTA revealed that the 19 cp genomes exhibited collinear gene organization (Figure S1).Among the compared cp genomes, there were no significant gene rearrangements observed and most PCGs regions were conserved with an identity value of >50%, except for the detection of the same relative gaps in ycf1, ycf2 and ndhB, respectively.However, the non-coding domains were less conserved in contrast to the coding regions, and the highest variation levels were detected in intergenic spacer regions (Figure S1).
To compare the gene distribution patterns between single copy regions and repeat regions in the cp genomes of Paphiopedilum species, we initially found that the identified IRb IR/LSC junctions were largely located between rps19 and rpl22, creating a copy of rps19 at the IRa/LSC border (Figure 2).The rpl22 gene spanned the IRb/LSC border in many species, with most portions being located in the LSC, and only a small portion extending to the IRb region, which ranged from 21 bp to 151 bp.However, the rps19 gene of P. concolor crossed the IRb/LSC border entirely, which indicated a clear expansion of the IR regions of this species.To compare the gene distribution patterns between single copy regions and repeat regions in the cp genomes of Paphiopedilum species, we initially found that the identified IRb IR/LSC junctions were largely located between rps19 and rpl22, creating a copy of rps19 at the IRa/LSC border (Figure 2).The rpl22 gene spanned the IRb/LSC border in many species, with most portions being located in the LSC, and only a small portion extending to the IRb region, which ranged from 21 bp to 151 bp.However, the rps19 gene of P. concolor crossed the IRb/LSC border entirely, which indicated a clear expansion of the IR regions of this species.The gene distribution patterns close to the IRb/SSC region were more complex, which could be roughly divided into several conditions as follows: (1) The ccsA were entirely expanded into the SSC region with lengths of from 471 to 496 bp; (2) The ccsA and rpl32 were in the IRb region, and the trnL was in the SSC region; (3) The ccsA and rpl32 were distributed in the IRb and SSC regions, respectively; (4) The psaC were in IRb region, whereas the rpl32 or trnL were in the SSC region.Interestingly, two copies of the ndhD gene spanned the junctions of both Irb/SSC and Ira/SSC, respectively, in P. armeniacum, whereas the other species such as P. barbigerum and P. bellatulum contained the pseudogene fragment of ndhD only in the SSC region.As expected, the psbA gene located in the LSC region close to the border of Ira/LSC was the general starting point of the cp genome (Figure 2).
To determine divergent hotspot regions in the 19 Paphiopedilum species, we compared the Pi values of the cp genomes using DNASP software.The 19 Paphiopedilum species Pi values ranged from 0 to 0.0516 (Figure 3).The mean nucleotide diversity value of the whole cp genome was 0.00962, while the corresponding values of the LSC, SSC, and IRs were 0.01194, 0.31009, and 0.00610, respectively.Overall, the single copy regions Pi values were higher than those of the repeat regions, suggesting that the single copy regions, particularly the LSC, might be regarded as potential divergent hotspots of Paphiopedilum.The mean Pi values of the intergenic and genic regions were also calculated.The sequence divergence was 0.02271 in the intergenic regions, which was higher than that in the genic regions (0.00732) of these chloroplast genomes.This was consistent with the results of MVISTA analysis (Figure S1).Furthermore, we eventually identified 12 specific mutational hotspot regions (eight in LSC, two in IRs, and two crossing the IR/SSC border) including trnE_UUC-trnT_GGU, rps16, psbK, psaC, rps15, ycf4, cemA, clpP, rpl33, and the pseudogene fragments ndhD, etc., which exhibited remarkably higher Pi values (>0.02463).Furthermore, the identified hotspot regions were employed to reconstruct the ML tree, jointly for all regions and separately for each region, respectively.The ML tree constructed by the concatenated regions maintained the similar topological structure with the tree constructed by the whole cp genomes, while the four identified hotspot regions including trnK-rps16, trnE_UUC-trnT_GGU, clpP and the psaC-rps15 also performed preferable discrimination ability for the classification of the studied Paphiopedilum species (Figure S2).
The codon usage frequency of PCGs for 19 Paphiopedilum species was estimated.The Bacterial and Plant Plastid Code was adopted in this analysis.The identified PCGs encoded by codons ranged from 22,135 to 25,525, which corresponded to P. malipoense and P. emersonii, respectively.UAA, UAG and UGA were considered termination codons (Table S2).For Paphiopedilum, we identified that the UUA codon that encoded leucine (L) presented the highest RSCU value (1.87), whereas the GAC codon, encoding for aspartic acid (D) had the lowest RSCU value (0.36).The AAA codon, encoding for lysine (K), was the most frequent amino acid with >1000 occurrences in most of the 19 Paphiopedilum species, while the UGC codon encoding for cysteine (C) had the lowest usage (71 to 89) among these species (Figure 4).In addition to the general AUG initiation codon, other types of initiation codons were found for several PCGs in some species, such as ACG, UUG, and AUA in the rpl2 gene, AUU in the petN gene, AUU in the ycf1 gene, as well as CUG and GUG in the rps19 gene (Table S3).
To better understand the origins and evolution of the cp genome and its relationship with the nucleus, the homologous genes shared by the cp and nuclear genomes were detected and identified.Due to the lack of genome data resources for Paphiopedilum, the nuclear genome data of closely related Dendrobium chrysotoxum species were adopted as the reference genome.The results revealed that the cp and nuclear genomes had high synteny.The colinear homologous segment counts ranged from 1429 (P.bellatulum) to 1525 (P.'GZSLKY' Youyou), and these similar fragments were primarily located on chromosome 4 (2589 hits), chromosome 12 (1917 hits), chromosome 13 (2350 hits), chromosome 15 (2097 hits), and chromosome 16 (3082 hits).The codon usage frequency of PCGs for 19 Paphiopedilum species was estimated.The Bacterial and Plant Plastid Code was adopted in this analysis.The identified PCGs encoded by codons ranged from 22,135 to 25,525, which corresponded to P. malipoense and P. emersonii, respectively.UAA, UAG and UGA were considered termination codons (Table S2).For Paphiopedilum, we identified that the UUA codon that encoded leucine (L) presented the highest RSCU value (1.87), whereas the GAC codon, encoding for aspartic acid (D) had the lowest RSCU value (0.36).The AAA codon, encoding for lysine (K), was the most frequent amino acid with >1000 occurrences in most of the 19 Paphiopedilum species, while the UGC codon encoding for cysteine (C) had the lowest usage (71 to 89) among these species (Figure 4).In addition to the general AUG initiation codon, other types of initiation codons were found for several PCGs in some species, such as ACG, UUG, and AUA in the rpl2 gene, AUU in the petN gene, AUU in the ycf1 gene, as well as CUG and GUG in the rps19 gene (Table S3).To better understand the origins and evolution of the cp genome and its relationship with the nucleus, the homologous genes shared by the cp and nuclear genomes were detected and identified.Due to the lack of genome data resources for Paphiopedilum, the nu- In contrast, chromosomes 2, 3, 5, 7, 8, and 10 exhibited relatively low synteny, with hit numbers of <1000 (Table S4).Due to the high conservatism of the cp genome, the five newly assembled cp genomes were employed to represent Paphiopedilum for the screening of genes shared by the cp and nuclear genomes, respectively (Figure 5).A total of 21 categories of gene fragments were detected among the cp and nuclear genomes, of which ndhF, ycf1, and ycf2 were collectively observed in the five cp genomes.For the nuclear genome, the shared gene fragments were primarily distributed between chromosomes 12, 14 and 16, and were primarily members of the ndh and ycf gene families.

Repeat Structure and SSR Analysis
Using MISA analysis, a total of 2714 SSRs were identified in Paphiopedilum cp genomes.For each Paphiopedilum species, the numbers of SSRs ranged from 117 (P.dianthum) to 174 (P.villosum).The A/T mononucleotide repeats were the most abundant SSRs, which accounted for ~40.16% of the total SSRs in Paphiopedilum, while the G/C repeats were relatively rare (1.77%).The number of penta-and hexanucleotides SSRs were slightly less than the other repeats, such as di-, tri-, and tetranucleotides (Table 3).In addition, the LSC regions contained the highest percentage of SSRs (70.41%), followed by the IR (27.11%) and SSC regions (2.48%).Among a total of 2714 SSRs, 1834 loci were in the intergenic

Repeat Structure and SSR Analysis
Using MISA analysis, a total of 2714 SSRs were identified in Paphiopedilum cp genomes.For each Paphiopedilum species, the numbers of SSRs ranged from 117 (P.dianthum) to 174 (P.villosum).The A/T mononucleotide repeats were the most abundant SSRs, which accounted for ~40.16% of the total SSRs in Paphiopedilum, while the G/C repeats were relatively rare (1.77%).The number of penta-and hexanucleotides SSRs were slightly less than the other repeats, such as di-, tri-, and tetranucleotides (Table 3).In addition, the LSC regions contained the highest percentage of SSRs (70.41%), followed by the IR (27.11%) and SSC regions (2.48%).Among a total of 2714 SSRs, 1834 loci were in the intergenic regions (60.30%), while only 880 loci were in the genic region.A total of 20 shared SSR motifs were identified among these cp genomes.Several unique SSR motifs existed only in particular species, of which most were penta-or hexanucleotides repeats.The P. venustum possessed the largest number of unique SSR motifs, including TCTAT, ATTTG, TTGTA, TCAATA, and ATATG (Figure S3).In addition to SSRs, we also further used REPUTER and TANDEM REPEATS FINDER to identify the repeat sequences of the cp genomes (Table 3).Four categories of interspersed repeats containing forward, reverse, complement, and palindromic repeats were detected, respectively.A total of 10,347 interspersed repeats were identified in Paphiopedilum, with the forward repeats being the most abundant (3457), followed by the palindromic repeats (3030).P. armeniacum possessed the largest number (909) of interspersed repeats, while P. dianthum contained the minimum number (275) of interspersed repeats.Further, a total of 3578 tandem repeats were found in Paphiopedilum.The lengths of the tandem repeats were primarily distributed within the range of from 1 bp to 30 bp, followed by 31 bp to 60 bp; however, the number of tandem repeats over 90 bp were relevantly rare.

Phylogenomic Analysis
To construct the phylogenetic relationships of the forty-seven cp genomes (species names and GenBank accession numbers can be found in Table S1), we employed an ML phylogenetic tree to determine the phylogenetic positions.These cp genomes included forty-one Paphiopedilum individuals, two Phragmipedium individuals, two Cypripedium individuals and two Lilium individuals as an outgroup (Figure 6A).Simultaneously, the existence or absence of protein coding genes in each species was also identified and counted (Figure 6B).The gene loss in species of the same genus was similar.Most genes contained one copy whereas some protein coding genes possess two copies including rps19, rps12, rps7, ycf2, rpl2, rpl23, ycf1, psaC and rps15.
After pruning, the alignment results were further employed to construct the ML tree.The results indicated that the topologies of the ML tree basically yielded an anticipated structure.The two Lilium species, as the outgroup, were located in the basal position of the ML tree.The Orchidaceae species employed in the phylogenic analysis could be divided into three major clades corresponding to Cypripedium, Phragmipedium, and Paphiopedilum.With Cypripedium being the basal group, the Phragmipedium and Paphiopedilum were gathered into a same clade, which reflected that the two genera had a relatively close relationship.The Paphiopedilum species were further classified into three subgenera: Subg.Paphiopedilum, Subg.Brachypetalum and Subg.Parvisepalum.The Subg.Paphiopedilum and Subg.Brachypetalum were closer in relationship with 100% bootstrap values.Furthermore, the Subg.Paphiopedilum, which contained the greatest number of studied species, could be divided into the three Sections corresponding to nine species (twenty individuals) in Sect.Paphiopedilum, four species (five individuals) in Sect.Barbata, and two species (four individuals) in Sect.Pardalopetalum, and each Section also had a relatively high support value for 100% (Figure 6A).Similar topologies also appeared in the ML tree constructed based on the identified divergence hotspots regions (Figure S2).The heat map of the numbers of the protein coding genes.Colors reflect the copy numbers of these genes in each species.The pseudogene was colored by brown blocks.The order of the individuals from top to bottom in the heatmap was consist with the ML tree.

Selective Pressure Analyses
We estimated the Ka/Ks ratios at the species level by concatenating all 61 shared PCGs into a super-matrix.For Paphiopedilum species, the Ka/Ks ratios were ~0.52, which inferred that at the complete chloroplast protein level, the Paphiopedilum species have been experienced to a strongly purifying selection (Figure 7 and Table S5).
The variation and selectivity were also explored within a given species (Table S6).The P. wardii contained the largest number of SNPs (5632) and InDels (708), while the P. parishii, P. tigrinum and P. dianthum presented a relatively lower differentiation which occurred only 1, 1, and 2 SNPs and 10, 3, 12 InDels, respectively.The results of selection pressure analysis showed that most species contained a Ka/Ks value < 1 with the significant p value (<0.05), suggesting the purification selection.While the P. dianthum P. tigrinum and P. henryanum possessed high Ka/Ks value correspond to 1.3390, 0.9553, and 0.9824, respectively, which implied potential positive selection in these given species (Table S6).

Discussion
The Orchidaceae family comprises one of the richest species in the flowering plants that includes myriad endangered, rare, or threatened species [20,60].As a fascinating genus of Orchidaceae, Paphiopedilum is typically referred to as the "tropical slipper or-chid" with high horticultural and ornamental value due to its typical flower traits of its shape [61].However, because it growths in fragile and peculiar environmental conditions, it is extremely vulnerable to human excavation and habitat destruction [9][10][11][12][13][14][15][16][17][18][19]62].
Recently, there have been many genetic resources, cp genome sequences, and analyses of genetics to help explain comparation, phylogenetic relationships and the evolutionary process of Orchidaceae [11,13,14,[63][64][65].In this study, we assembled five cp genomes of the Paphiopedilum species (Table 1 and Figure 1) and compared the complete cp genomes of nineteen Paphiopedilum species.They exhibited classic quadripartite structures and included 119 to 127 genes consisting of 74 to 81 protein coding genes, 37 to 38 tRNAs, and four rRNAs.In these cp genomes, 152,130 bp (P.tigrinum, MN587804) to 169,786 bp (P.wardii, MH191341) were obtained in length, whereas the overall GC contents ranged from 34.00% (P.wardii, MH191341) to 36.17% (P.hirsutissimum, MH191341) with an average value of 35.59% (Table 1, Tables S1 and S6).A cp genome size of nine Paphiopedilum species was reported, ranging from 154,908 bp to 161,300 bp [9], 159,795 bp to 160,040 bp (Coelogyne) [63], and 197,815 bp to 212,668 bp (Cypripedium subtropicum, the largest cp genome in Orchidaceae) [64], which indicate that the size of cp genomes in orchid species had extremely diversity [11,[63][64][65].Although it was apparent that their structures and organization exhibited relatively high conservation, the GC contents of Paphiopedilum cp genomes were relatively lower than those of other Orchidaceae species [66][67][68], which might be interpreted by natural selection [69].However, in genus Cypripedium, interestingly, the GC content (28.2% and 30.5%) of cp genomes are also much lower than our present cp genomes [64].
From different environments, the closely related plant species marked differences GC content in the DNA sequences, which had a direct effect on the protein amino acid sequences in their typical environments [70].Low GC contents genes were more easily transcribed than those with the converse, as GC pairs possess three hydrogen bonds, which makes them more stable than AT pairs with two hydrogen bonds [71].Consequently, selective pressures of the unique habitats (humidity and shading) of Paphiopedilum shade plants resulted in a lower overall GC content in their cp genomes.
In addition to the GC contents, the lengths between these cp genomes were variable (Tables 1 and S1, and Figure 1).In general, the lengths of IR regions often determined the total lengths of cp genome sequences [72].Two inverted repeats (IRa and IRb) could be conserved in the cp genome against the main structural rearrangements [73,74].Furthermore, the intervals of the four cp genome regions played a critical role in the evolution of some plant species via homologous recombination-induced repair mechanisms [75].The structure of the Paphiopedilum cp genome was conservative with no obvious region rearrangements detected, which was consistent with previous studies [9].Thus, we speculated that the variations in the cp genome length in Paphiopedilum might be induced by IR expansion and SSC contraction, because of the instability and shifting of IR/SSC boundaries (Figure 2).
In some angiosperms, dramatic changes in the IR regions occurred.For example, the length of the IR region in Pinus thunbergii was only 495 bp [76], while in Pelargonium hortorum it was ~76 kb [72].In general, the average cp genome of angiosperms was ~151 kb and the IR region was ~25 kb [77,78].However, the cp genomes under study for Paphiopedilum showed both the longer total length (~158 kb) as well as IR length (~33.8 kb) (Tables 1 and S1).A relevantly large (several kb) inverted repeats (IRs) expansion had also been reported in other angiosperm lineages, including Acacia [79], Inga [79], Erodium [80], Passiflora [81], and Pelargonium [82].The plastome expansion was also common in orchid species, such as Cypripedium [64] and Cymbidium.In contrast to the expansion of the IR domains in cp genome, the SSC regions of Paphiopedilum species were greatly reduced in sizes and the gene numbers (Tables 1 and S1), and even peculiar genes (e.g., psaC and ndhD) in SSC region, which were transferred from SSC to the IR regions in P. parishii, P. dianthum, P. armeniacum, and all five of the newly assembled species (Figure 2).In other Orchidaceae genus such as Coelogyne, the gene of rpl22 and ycf1 had shortened, whereas the length of the ndhF gene had increased, which was also caused by contraction of the SSC and the expansion of IRs [63].While in other Orchidaceae genus Polystachya, the rpl22 gene in the LSC region was expanded by 23-66 bp to the IRB region caused by the IR expansion [65].
With the shifts in the boundaries of the IR region and SSC region, the contract of the SSC regions might be related to the ndh genes loss and several genes transfer from the SSC region to the IR region.In particular, the most ndh genes were pseudogenized or lost in Paphiopedilum, except for P. emersonii (ndhB), P. parishii (ndhB and ndhD), and P. concolor (ndhK) (Table 2, Figure 6).Niu et al. [83] reported that the ndh genes were strongly associated to the stability of the IR/SSC junction.Interestingly, the species that contained more pseudogenized ndh genes tended to gather into the basal clade of Paphiopedilum (Figure 6), which could infer that the loss or pseudogenization of ndh genes likely occurred much earlier, as the footprints of pseudogenic ndh genes were repeatedly and persistently retained in the ancestors.
Furthermore, the degradation and pseudogenization of the ndh genes were found in other Orchidaceae species (e.g., Erycina pusilla [84], Vanilla planifolia [85], and Liparis japonica [67].The ndh gene variations were found in IR boundaries in Paphiopedilum species, and were considered to be attributed to the recombination of IRs.[11].Although the pseudogene DNA did not encode proteins, it might still be functional and play a regulatory role akin to other non-coding DNA family genes [86].In addition to the border shifts of IR/SSC, the underlying mechanisms of the degradation of ndh genes in Paphiopedilum warranted further in-depth exploration. Lin et al. [87] proposed that the complexes of ndh loss likely raised the transition of its life history from photoautotrophic to heterotrophic.We speculated that the degradation of ndh genes in Paphiopedilum was the result of adaptive evolution to a low light environment.As with the results of other Orchidaceae studies (as is typical of epiphytes or lithophytes of same the species in Paphiopedilum living beneath dense canopies with insufficient light), Paphiopedilum could utilize low light only within a relevantly narrow ecological amplitude of light adaptation, which exhibited the characteristics of shade plants.
Furthermore, the transfer of genes to the nucleus might be another reason for the gene loss and pseudogenization in the cp genome [87][88][89].In this study, the ndh and ycf gene fragments frequently occurred in the nuclear genome (Figure 5), which indicated that the gene transfer events might have occurred during the evolutionary process of Paphiopedilum cp genomes.Gene transfer in Paphiopedilum might be explained by the decreased demands on photosynthesis and plastid translational capacities due to the adaptation to the unique habitat, which increased the success rate of gene transfer from plastid to the nucleus [80].
Codons were beneficial genetic information for the transmission, and were used in the genomes evolution as their nucleic acid sequences and proteins were linked [90].The usage of codons was different in different species [91] due to various factors such as codon hydrophilicity, gene length, abundance of tRNA, base composition, natural selection, and gene expression rate [92].The average number of codons (24,239) in Paphiopedilum was relevantly low compared with other species [93], which might have been due to the loss and pseudogenization of certain protein coding genes (CDS).Additionally, the results of RSCU analysis indicated that G or C was more inclined a lower nucleotide frequency (RSCU < 1.0) than A or U (RSCU > 1.0) at the third codon position.The exception was UUG (RSCU = 1.87) (Figure 4, Table S2), which was similar to that in other angiosperm cp genome research [4,93].
Moreover, the initiation codons in Paphiopedilum were almost always AUG, whereas some genes contained different codon types such as ACG for rpl2 or GUG for rps19 (Table S3), which was consistent with other monocots studies [94].Earlier investigations showed that initiation codons interfered with translation efficacy [95,96], and C to U RNA editing events commonly existed in higher plant chloroplasts [97].This might have resulted in the biased codon usage of Paphiopedilum, in conjunction with the environmental adaptation caused by natural selection.
The repeat sequences of chloroplast could be provided useful resources to study genome recombination in plants.The identified SSRs (microsatellites) in the Paphiopedilum cp genome exhibited a high copy number diversity, and were significant molecular markers with transferable capacities across species and genera for population genetics and evolutionary studies [93,98].Among SSRs in Paphiopedilum cp genomes, the mononucleotide A/T repeat units was identified as having the highest numbers, and the proportion was ~40.16% (Table 3), which was consistent with the relevantly low GC content in the cp genome.Most of the SSRs were in the intergenic regions (60.3%), which might have been owing to the fact that there was a higher mutation rate in the intergenic rather than the coding regions.Certain unique SSRs were also useful for the identification of Paphiopedilum species (Figure S3).
The variation, differentiation, and diversity of the cp genome were explored by comparing the sequences through polymorphism analysis (Figures 3 and S1).As predicted, being largely consistent with recent studies [4,93], the IR regions had a much lower variation (0.00610) than the SSC (0.31009) and LSC (0.01194) regions, which was likely due to copy correction between IR sequences via gene conversion and replication [99].The higher genetic diversity of SSC might be the result of the variable lengths (663 bp to 5916 bp) of the SSC regions.Furthermore, it was found that divergence in the intergenic regions was higher than that of the genic regions, as is seen in most angiosperms (Figure S1).By comparison, some divergent hotspot regions identified in our study were coincident with the results of Sun et al., 2021 [9], such as the trnK_UUU-rps16, psbK-psbI, trnE_UUC-trnT_GGU, clpP and psaC-rps15.Vu et al. 2020 compared the chloroplast genome of Papiopedilum delenatii with that of other orchids and also found the similar divergent hotspots region including rps16, trnE_UUC-trnT_GGU, psaI, clpP, and psaC [11].These identified divergent hotspot regions, especially the four highly credible regions (trnK-rps16, trnE_UUC-trnT_GGU, clpP, and psaC-rps15) which contained the similar ML topology with the whole cp genome and the concatenated regions (Figure S2), could be useful as genetic markers for the further studies on the phylogenetic relationships, DNA barcodes, genetic population, and evolution of the genus Paphiopedilum and orchid species.
The lower Ka/Ks ratios at the cp genome level within the Paphiopedilum species indicated that most genes were subjected to a purifying selection to retain conserved functions (Figures 7 and S4).Environmental factors, such as solar radiation and temperature, can impact mutation rates, metabolism, and growth rates [100,101].Previous studies revealed that the distribution patterns and evolution of Paphiopedilum were significantly correlated with elevation, solar radiation, temperature (mean diurnal range, annual temperature range), and precipitation (seasonal precipitation, warmest quarter precipitation) [7].One of the most prevalent forms of natural selection, purifying selection, constantly sweeps away deleterious mutations in populations [1].
The purifying selection of most genes within the Paphiopedilum species are likely the evolutionary result of the preservation of its adaptive characteristics.However, the positive selection was also found in several specific genes (accD, matK, psbM, rpl20, rps12, ycf1, and ycf2).In most cases, genes related to a specific environment were typically assumed to be under positive selection [102].These genes might serve as candidates that contribute to the adaptive evolution of Paphiopedilum.Therein, matK is commonly used as a phylogenetic signal that reveals evolutionary relationships due to its high substitution rates [103].
The signal of positive matK selection was also found in some hydrophytes, which exhibited low light adaptations [1].The ycf2 gene was the largest chloroplast gene reported in angiosperms for assessing sequence variations and evolution in plants [104].The positive selection of ycf2 was also found to be involved in the adaptation of other species [105], and the variation in psbM genes was related to the senescence of Pisum sativum [106].Overall, the identification of these positive selective genes provided valuable resources for further research into the adaptive evolution of Paphiopedilum.
Chloroplast genomes have been widely used in studies of phylogenetics, evolutionary biology, and population genetics of Orchidaceae species, owing to this, the family had the largest number of species in the world [11,13,14,[24][25][26][27]63,64].The genus Paphiopedilum includes over 66 species in the world; however, the other genus had a greater number of species such as Polystachya (~240 species) [65], Coelogyne (over 200 species) [63].Paphiopedilum, called "lady of Venus", is one of the most cultivated horticulture plants worldwide because of its typical beautiful flowers, belonging to the subfamily Cypripedioideae of Orchidaceae, firstly described by Pfitzer in 1886 [9,13,14,64,65].In the subfamily Cypripedioideae, Paphiopedilum, Cypripedium and Phragmipedium had a relatively closer relationship and had been divided into three distinct groups until the nineteenth century.
The phylogenetic relationships and classification of Paphiopedilum are quite complex and have been studied for a long time [12,21,22,34].The newly assembled cp genomes offered additional data and resources for Paphiopedilum phylogenetic research (Table 1).In the current study, the phylogeny was basically consistent with the recent phylogenies obtained from partial organelle DNA markers (Figure 6) [21,23] or specific cp genome genes such as matK and rbcL [14].At the genus level, Paphiopedilum and Phragmipedium presented a closer relationship (Figure 6), which was also consistent with the other phylogenetic studies of Paphiopedilum based on the cp genome [9,12,21].While within the studied genus Paphiopedilum, 41 Paphiopedilum individuals were classified into three subgenera: Subg.Paphiopedilum, Subg.Brachypetalum and Subg.Parvisepalum with 100% bootstrap values (Figure 6A).The Subg.Paphiopedilum was further divided into the three sections corresponding to Sect.Barbata, Sect.Paphiopedilum and Sect.Pardalopetalum, respectively.These results were also supported by the results of ML analysis by using the identified divergence hotspot regions (Figure S2).However, Paphiopedilum taxa is complex, and was divided into different subgenera based on morphological, molecular data, and diversity genetic markers [9][10][11]13,14,[63][64][65].There are still some unresolved phylogenetic questions that still need to be resolved in Paphiopedilum due to the genome size, gene content, expansion, gene repeats, GC content, and hybridization [7][8][9][10][11][12][13][14]21,100].
The variation and selectivity explored within the given species showed that Paphiopedilum species majorly underwent the purifying selection except for P. dianthum, P. tigrinum and P. henryanum (Table S6).Within species, some cp genomes were not clustered together within species in the ML phylogenetic tree, which reflected the high genetic variations that exist among these individuals (such as, 5,632 SNPs between two cp genomes of P. wardii, 1958 SNPs between two cp genomes of P. bellatulum, and 2919 SNPs between two cp genomes of P. concolor).We found that the total length of cp genomes had a large difference within species, including 10,947 bp in P. wardii (MN587760 and MH191341), 5498 bp in P. bellatulum (MZ150829 and MN315107), and 4794 bp in P. concolor (MN587764 and MH191340), respectively.There were many mutation events (insertions, deletions, substitutions, and inversions), gene loss, inverted repeats expansion, and gene rearrangements have been found in cp genomes, which impacted the classification status in the phylogenetic tree analyses [6], especially for this quite complex genus Paphiopedilum [9,11,12,14].The limited number of available cp genomes within the specific species limits the study of intraspecific variation and evolution.In the future, with the announcement of more cp genomes, it is possible that population genomic research of the given species will reveal more evolution and variation mechanisms.

Conclusions
For this study, five cp genomes of Paphiopedilum species were newly sequenced, assembled and annotated.Subsequently, in combination with other previously reported cp genomes, we explored their comprehensive characteristics, including length, GC contents, and gene counts.The results indicated that these cp genomes shared similar genome structures, whereas significant IR expansion and SSC contraction were observed.Akin to the results of other studies, the intergenic regions contained higher variabilities, while the IR regions exhibited lower divergence due to copy correction between IR sequences via gene conversion and replication.The twelve divergent hotspot regions, as well as the identified SSRs, could be available for the development of molecular markers and plant authentication.
The low GC content, gene transfer, and degradation of ndh genes might be the result of the adaptive evolution of cp genomes in Paphiopedilum.Furthermore, at the species level, the Ka/Ks ratios indicated the Paphiopedilum species were subjected to purifying selection.At the gene level, we observed that some specific genes were under positive selection, such as matK, ycf2, and psbM, which indicated that they were significant potential candidate genes during the Paphiopedilum evolutionary process.These results might be the adaptive responses to their unique habitats.Our study not only provided valuable genomic resources for the further utilization, development and in-depth research into the endangered Paphiopedilum germplasm, but also provided insights for the investigation of the adaptive evolution of chloroplast genomes in Orchidaceae.

Horticulturae 2022, 8 , 27 Figure 1 .
Figure 1.The whole chloroplast genome circle map of the five Paphiopedilum species.The gene names and their codon usage bias are shown on the outermost layer.Directed by arrow, the outside genes are transcribed clockwise, the inner genes are transcribed counter−clockwise, respectively.The gene GC content is plotted in the second circle, depicted in dark blue.The symbol "*" indicated that the pseudogenes in these chloroplast genomes.

Figure 1 .
Figure 1.The whole chloroplast genome circle map of the five Paphiopedilum species.The gene names and their codon usage bias are shown on the outermost layer.Directed by arrow, the outside genes are transcribed clockwise, the inner genes are transcribed counter−clockwise, respectively.The gene GC content is plotted in the second circle, depicted in dark blue.The symbol "*" indicated that the pseudogenes in these chloroplast genomes.

Figure 2 .
Figure 2. Comparison of the borders of the LSC, SSC, and IR regions between 19 chloroplast genomes.JLB, JSB, JSA, and JLA denoted the junction sites of LSC/IRb, IRb/SSC, SSC/Ira, and IRa/LSC, respectively.

Figure 2 .
Figure 2. Comparison of the borders of the LSC, SSC, and IR regions between 19 chloroplast genomes.JLB, JSB, JSA, and JLA denoted the junction sites of LSC/IRb, IRb/SSC, SSC/Ira, and IRa/LSC, respectively.

Figure 3 .
Figure 3.Nucleotide diversity of different regions of Paphiopedilum chloroplast genomes.Red line represents the top 5% threshold of the average nucleotide diversity.The genes located around the hot spots of differentiation are marked with arrows.

Figure 3 .
Figure 3.Nucleotide diversity of different regions of Paphiopedilum chloroplast genomes.Red line represents the top 5% threshold of the average nucleotide diversity.The genes located around the hot spots of differentiation are marked with arrows.Horticulturae 2022, 8, x FOR PEER REVIEW 12 of 27

Figure 4 .
Figure 4. Codon usage bias reflected by RSCU values of codons for each amino acid.

Figure 4 .
Figure 4. Codon usage bias reflected by RSCU values of codons for each amino acid.

Figure 5 .
Figure 5. Circle plot for genome-wide synteny analysis between chloroplast and nuclear genomes.Colored lines representing homologous gene fragments (E value: 10-5, Identity > 99%) share the two different types of genomes.Different colors correspond to each specific chromosome.The lengths of these genomes were scaled and standardized.

Figure 5 .
Figure 5. Circle plot for genome-wide synteny analysis between chloroplast and nuclear genomes.Colored lines representing homologous gene fragments (E value: 10-5, Identity > 99%) share the two different types of genomes.Different colors correspond to each specific chromosome.The lengths of these genomes were scaled and standardized.

Figure 6 .
Figure 6.The Maximum Likelihood (ML) phylogeny tree and the heat map of the copy's numbers in protein coding genes for 47 individuals including 41 Paphiopedilum individuals, two Cypripedium individuals, two Phragmipedium individuals and two Lilium individuals.(A) The ML tree of 41 Paphiopedilum individuals plus 6 taxa constructed by using the whole cp genome sequences.Numbers on nodes of the ML tree indicated bootstrap values.The symbols i, ii, and iii were used to indicate the Subg.Paphiopedilum, Subg.Brachypetalum and Subg.Parvisepalum, respectively.The sections within the Subg.Paphiopedilum were also annotated by using the blue bars.(B)The heat map of the numbers of the protein coding genes.Colors reflect the copy numbers of these genes in each species.The pseudogene was colored by brown blocks.The order of the individuals from top to bottom in the heatmap was consist with the ML tree.

Figure 7 .
Figure 7. Selective pressure of shared protein coding genes in Paphiopedilum.(A) The x−axis represents the Ks (synonymous substitutions) value, while the y−axis represents the Ka (non-synonymous

Figure 7 .
Figure 7. Selective pressure of shared protein coding genes in Paphiopedilum.(A) The x-axis represents the Ks (synonymous substitutions) value, while the y-axis represents the Ka (non-synonymous substitutions) value.Red dots represent gene pairs under positive selection (Ka/Ks > 1), and blue dots represent gene pairs under purifying selection (Ka/Ks < 1).(B) Overall distribution pattern of the Ka/Ks value of each gene in Paphiopedilum.

Table 1 .
The chloroplast genomic features of the five Paphiopedilum species newly assembled in this study.

Table 2 .
Genes encoded in the five Paphiopedilum chloroplast genome.

Table 3 .
The microsatellites and long repeat sequences in Paphiopedilum.