Comparative Plastome Analyses of Ephedra przewalskii and E. monosperma (Ephedraceae)

Ephedra species were erect, branching shrubs found in desert or arid regions worldwide as the source of ephedrine alkaloids. In this study, the complete chloroplast genome of Ephedra przewalskii and E. monosperma on the Qinghai-Tibet Plateau were sequenced, assembled, and annotated. Compared with the other four published Ephedra species, the chloroplast genomes of Ephedra species were highly conservative, with a quadripartite structure. The length of the chloroplast genome was 109,569 bp in E. przewalskii with 36.6% GC and 109,604 bp in E. monosperma with 36.6% GC. We detected 118 genes in both Ephedra species, including 73 PCGs, 37 tRNA genes, and eight rRNA genes. Among them, the ndh family genes were lost, which could be used to study the phylogeny and genetic diversity of the genus Ephedra, combined with multiple highly variable intergenic spacer (IGS) regions. Codon usage preference of Ephedra species was weak. The ratio of non-synonymous substitutions and synonymous substitutions was low, showing that the PCGs of Ephedra may be under the pressure of purifying selection. ML and BI analysis showed similar phylogenetic topologies. Ephedra species clustered together in a well-supported monophyletic clade. E. przewalskii and E. monosperma were not gathered in one clade, consistent with the classification system by Flora of China. This study reveals differences in the chloroplast genomes of Ephedra, providing valuable and abundant data for the phylogenetic analysis and species identification of Ephedra.


Introduction
The genus Ephedra (Ephedraceae) is mainly distributed in desert and arid regions, with approximately 40 species worldwide. 14 species of Ephedra are distributed in China [1]. Potential divergence factors of Ephedra include the uplift of the Qinghai-Tibetan Plateau (QTP) and the Asian aridification [2]. Since Linnaeus established Ephedra in 1753, several scientists have held diverse views on classification system revisions within this genus [3][4][5][6][7].
The species of Ephedra are known for their ecological and medicinal values. Due to a well-developed root system with drought-and cold-resistant characteristics, it can be used in sand fixation and soil conservation programs. It has also long been an important medicinal plant in China. Containing a plethora of chemical components, it can be used to treat a variety of diseases including cold, asthma, hay fever, and urticaria [1,8]. In addition, Ephedra is a good source of ephedrine alkaloids that can be used to make weight-loss medicine and illicit drugs such as methamphetamine in Western countries [9]. Besides, extracts of E. sinica may be useful in the treatment of COVID-19. [10].
E. przewalskii has a much lower ephedrine content than other Ephedra species. However, it contains synthesis pathways for stilbene, diarylheptanoid, and other medicinal components. Stilbene has a variety of biological activities, including disease-resistance, anti-oxidation, anti-tumor, and anti-inflammatory activity. Also, the diarylheptanoid has anti-tumor activity [11,12]. Direct ionization mass spectrometry or ITS2 barcodes are commonly used to identify Ephedra species [13,14]. Studies of Ephedra have concentrated on transcriptome data mining, the medicinal value of its chemical compounds, and the classification of morphological characteristics [11][12][13][14]. Despite numerous previous studies, it is difficult to distinguish Ephedra species based on morphological features [15]. However, several studies, most of which did reveal the phylogenetic relationship between the Ephedra species used chloroplast DNA fragments or nucleus DNA (ITS sequences) [16,17]. Also, some other studies employed complete plastid genome sequences. These studies provided new insights and ideas for dealing with the phylogenetic issues of Ephedra [18,19].
The chloroplast, having a small genome and being inherited uniparentally, is an essential organelle for photosynthesis [20]. Fairly conservative in their structure and sequence plastomes have evolved into an effective tool in plant evolutionary and systematic studies [21,22]. Most land plants have a chloroplast genome that is 120 to 160 kb in size. It has a quadripartite structure, consisting of two single-copy regions (LSC, SSC) and two inverted repeat regions (IRa, IRb) [23]. According to literature records, the ancient cyanophyte endosymbiosis had chloroplasts with a number of functional genes. However, there have been gene loss or transfer events during the evolution of chloroplasts, such as the absence of ndh (NADH dehydrogenase) family genes [24]. This was observed in the other species [25,26]. With the rapid advancement of sequencing technology and its decreasing cost, the demand for chloroplast genomes sequencing has been increasing. Some medicinal plants, including Dipterygium glaucum, Cleome chrysantha, Bupleurum sikangense, and Ephedra equisetina have been sequenced with massive chloroplast genome data obtained [25,27,28]. Moreover, chloroplast genome sequences have been widely used in phylogenetic and population genetic studies [29][30][31].
In this study, we sequenced and annotated the chloroplast genomes of E. przewalskii and E. monosperma. We also carefully compared them with the other published chloroplast genomes from Ephedra (E. intermedia, E. equisetina, E. foeminea, and E. sinica) to detect the differences in the chloroplast genome. The analysis of chloroplast genome structure, long repeats, short repeats, codon preference, prediction of potential RNA editing sites, and analysis of the adaptive evolution by selective pressure analysis of protein-coding genes contributes to a better understanding of the differences in the chloroplast genome of Ephedra species. It provides valuable and abundant data for the phylogenetic analysis and species identification of Ephedra.

DNA Extraction and Sequencing
Fresh leaves of E. przewalskii and E. monosperma were sampled in Mangai (Geographic coordinates: 38 • (Table S1). The fresh leaves were cleaned with 75% alcohol and ddH 2 O, quickly placed in liquid nitrogen, then transferred to -80 • C for storage after returning to the laboratory. Voucher specimens (E. przewalskii: QXA160729005; E. monosperma: Chensl-0514) were deposited in the Qinghai-Tibetan Plateau Museum of Biology (HNWP). Total genomic DNA was extracted from fresh leaf tissue by the modified cetyltrimethylammonium bromide (CTAB) method [32]. Qubit Fluorometer (Thermo Fisher, Asheville, NC, USA) was used to estimate DNA concentration. Quality analysis of extracted DNA was evaluated using agarose gel electrophoresis and completed library preparation, following the manufacturer's instructions. E. przewalskii and E.monosperma were sequenced on Novaseq 6000 platform (Illumina Inc., San Diego, CA, USA) with 150 bp paired-end (PE) sequencing.

Genome Assembly and Annotation
Raw data were filtered using Trimmomatic v. 0.33 [33] and FastQC v. 0.11.8 [34] by discarding low-quality reads, shorter reads, and adapters. High-quality reads (clean reads) were assembled with the default parameters by using SPAdes v3.13.1 [35] and NOVOPlasty v3.2, with E. equisetina (MH161420) as the reference genome [36]. Online software GeSeq [37] annotated the complete chloroplast genome of two Ephedra species with reference genomes (E. monosperma, Genbank: NC_054357 and E. equisetina, Genbank: MH161420). After manually reviewing, the GenBank files were submitted to GB2sequin to obtain the original sequin files. Sequin software v16.0 was used to check sequin files by adjusting the position of the intron and exon. The circular gene map of the chloroplast genome was drawn by Organellar GenomeDRAW (OGDRAW) [38], and the final cp genomes of E. przewalskii and E. monosperma were submitted to the GenBank [39]. Available online: https://www.ncbi. nlm.nih.gov (accessed on 3 January 2021) (Accession number: E. przewalskii MZ567015 and E. monosperma OK505605).

Comparative Plastomics in Ephedra
We used the online software BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed on 10 January 2021) to determine cp genome sequences with more than 95% coverage and a length was more than 100 kb. Finally, four other Ephedra species were selected to conduct comparative genomic studies, including E. intermedia (NC_044772.1), E. equisetina (MH161420), E. foeminea (NC_029347), and E.sinica (NC_044773). E. przewalskii (MZ567015) and E. monosperma (OK505605) plastome sequences were compared with the above-mentioned Ephedra cp genomes to visualize their similarities and differences using mVISTA online software with Shuffle-LAGAN mode, with E. przewalskii as the reference [40]. IRscope was used to compare LSC (Large Single Copy), IRb (Inverted repeat), SSC (Small Single Copy), and IRa (Inverted repeat) regions of these six complete cp genomes with default parameters, illustrating the contraction and expansion for IR/SC regions [41]. The GC content of the six species was conducted by MEGAX 11.0 [42].
The sequences of six Ephedra complete cp genomes were aligned using MAFFT v7 [45]. Then, we used aligned results to calculate the nucleotide variability (Pi) using a sliding window analysis in DnaSP v6, with a window length of 600 bp and a step size of 200 bp [46]. We chose three Ephedra species (E. przewalskii, E. monosperma, E. intermedia), Gnetum luofuense, and Cycas szechuanensis to study evolutionary selection pressure. Ka Ks_ Calculator v2.0 was used to obtain the ratio of non-synonymous to synonymous rates (Ka/Ks) for shared protein-coding genes (PCGs) of these five species, with genetic code table: 11 bacterial and plant plastid code, the way of calculation: NG [47]. Moreover, we used the PREP online tool (http://prep.unl.edu/, accessed on 30 June 2021) to predict the RNA editing sites for the PCGs of two Ephedra (E. przewalskii, E. monosperma) species.

Phylogenetic Profiling
We utilized 14 sequences of Ephedra cp genomes from NCBI to conduct phylogenetic analysis on concatenated sequences of 68 PCGs, with Cycas szechuanensis (NC_042668.1) as the outgroup (Table S2). PhyloSuite v1.2.2 was used to extract PCGs of these cp genomes [48]. MAFFT v7 was used to align sequences for CDS (Coding Sequence) and manually adjusted the aligned sequences using MEGA v11.0 [42]. ModelFinder was used to find the best-fitting models in IQ-TREE v1.6.12 [49]. We used IQ-TREE v1.6.12 to reconstruct the Maximum likelihood (ML) tree with the GTR + F + G4 model and MrBayes v3.2.6 in Phylo-Suite v1.2.2 to reconstruct the Bayesian inference (BI) tree for the GTR + G + F model, with two parallel runs and 1,000,000 generations [48,50]. Sampling trees every 100 generations, and discarding the first 25% generation (burn-in = 25%) of preheated trees. The branch support analysis was conducted using Ultrafast bootstrap and 5000 bootstrap replications.

Ephedra Chloroplast Genome Features
A total of 10 Gb sequencing data were obtained using Novaseq 6000. All chloroplast genome sequences for Ephedra species showed a highly conservative circular structure with four regions, including Large Single Copy (LSC), Small Single Copy (SSC), and two copies of the Inverted Repeat Region (IRa, IRb) ( Figure 1). The length of the complete chloroplast genomes in E. monosperma (109,604 bp) was longer than E. przewalskii (109,569 bp), and the remaining four full-length cp genomes ranged from 109,550 bp in E. sinica to 109,667 bp in E. intermedia (Table 1). All six Ephedra cp genomes were divergent by only 8117 bp in size. The longest length of the LSC region was 60,027 bp in E. foeminea, and the shortest was 59,936 bp in E. intermedia. The lengths of SSC and IR regions ranged from 8078 bp in E. equisetina to 8247 bp in E. intermedia, and from 20,731 bp in E. przewalskii to 20,753 bp in E. monosperma, respectively.     The overall GC content of six Ephedra species varied from 36.6% to 36.7%, in which LSC and SSC regions ranged from 34.1% to 34.2% and from 27.3% to 27.9%, respectively. The IR regions possessed a GC content of 42% in all Ephedra species (Table 1). Six Ephedra species were identical in gene order and content. A total of 118 genes were annotated, including 73 protein-coding genes, 37 tRNA genes, and eight rRNA genes. Nineteen genes located in IR regions were duplicated, whereas others were unique. The LSC regions contained 58 PCGs and 20 tRNA. The IR regions contained 7 PCGs, 8 tRNA, and four rRNA. The SSC regions had four PCGs, and one tRNA in six Ephedra species ( Table 2). The ycf3 gene included two introns. The ycf3 and rps12 gene contained three exons, and the remaining ten genes contained two exons. The rps12 gene was a trans-splicing gene, whose exons were split between LSC and IR regions (Table 3). Table 2. List of annotated genes in six Ephedra cp genomes.

Category of Genes
Group of Gene Gene IDs

Genes for Photosynthesis
Subunits of photosystem I psaA l ; psaB l ; psaC s ; psaI l ; psaJ l Subunits of photosystem II psbA * ,i,l ; psbB l ; psbC l ; psbD l ; psbE l ; psbF l ; psbH l ; psbI l ; psbJ l ; psbK l ; psbL l ; psbM l ; psbN l ; psbT l ; psbZ l ; psbN ** ,l Subunits of Cytochrome b/f complex petA l ; petB l ; petD * ,l ; petG l ; petL l ; petN l Subunits of ATP synthase atpA l ; atpB l ; atpE l ; atpF * ,l ; atpH l ; atpI l Large subunit of RUBISCO rbcL l Maturase matK l Other genes Envelope membrane protein cemA l Photochlorophyllide reductase subunit B/L/N chlB l ; chlL d,i ; chlN d,i C-type cytochrome synthesis gene ccsA s Protease clpP l Translational initiation factor infA l Genes of unknown function Conserved open reading frames ycf1 s ; ycf2 d,i Assembly/stability of photosystem I ycf3 ** ,l ; ycf4 l Note: d Gene with copies, * , ** Gene with one intron/ two introns, l,s,i Gene located in LSC/SSC/IR region, l&i Gene was a trans-splicing gene, whose exons could be found in the LSC and IR regions.

Repeat Sequences and SSR Analysis
We analyzed six cp genome sequences for short repeats (SSRs, simple sequence repeats). The result indicated mononucleotide was the most abundant repeat type, but no hexanucleotide was found in Ephedra. 61 Figure 2). Among these SSRs, there was the most mononucleotide with the number in E. foeminea (50) and the least in E. intermedia (42). Besides, we detected SSR in various regions of cp genomes, including LSC, SSC, IR, CDS, rRNA, tRNA, and IGS (Intergenic spacers) ( Figure S1). SSRs were more abundant in IGS regions than in other regions, but in rRNA regions, they were the least abundant. The mononucleotide repeat analysis results were presented in six species (E. przewalskii, E. monosperma, E. intermedia, E. equisetina, E. foeminea, E. sinica): the highest number was poly A/T, ranging from 40 to 47, and the lowest was poly C/G, varying from one to three. Only one di-nucleotide (AT/TA) and trinucleotide (ATA/TTA), five tetra-nucleotide (AGGT/ATTG, CAAA/TTCT, ATAA/ATCT, ATAG/AATA, and CTAC/CTAT), two pentanucleotide (TTTTA/TTTTC, ATAAA/AAGAA) were discovered in each cp genomes, while ATTTC was merely in E. przewalskii ( Figure S2). Also, we conducted the long repeats analysis that detected 125 non-overlapped repeats (54 palindromic, 34 forward, 19 complement, and 18 reverse repeats) in six Ephedra cp genomes (Figure 3). The palindromic repeats were the most common in these genomes, and the number varies from seven to eight. No reverse repeat was detected in E. przewalskii, whereas seven reverse repeats were found in E. monosperma. All four types of repeats were more abundant in the LSC region than in the SSC and IR regions. Moreover, we identified the length statistics of long repeat sequences in different size ranges for these cp genomes ( Figure S3). Also, we conducted the long repeats analysis that detected 125 non-overlapped repeats (54 palindromic, 34 forward, 19 complement, and 18 reverse repeats) in six Ephedra cp genomes ( Figure 3). The palindromic repeats were the most common in these genomes, and the number varies from seven to eight. No reverse repeat was detected in E. przewalskii, whereas seven reverse repeats were found in E. monosperma. All four types of repeats were more abundant in the LSC region than in the SSC and IR regions. Moreover, we identified the length statistics of long repeat sequences in different size ranges for these cp genomes ( Figure S3).

Repeat Sequences and SSR Analysis
We analyzed six cp genome sequences for short repeats (SSRs, simple sequence repeats). The result indicated mononucleotide was the most abundant repeat type, but no hexanucleotide was found in Ephedra. 61 Figure 2). Among these SSRs, there was the most mononucleotide with the number in E. foeminea (50) and the least in E. intermedia (42). Besides, we detected SSR in various regions of cp genomes, including LSC, SSC, IR, CDS, rRNA, tRNA, and IGS (Intergenic spacers) ( Figure S1). SSRs were more abundant in IGS regions than in other regions, but in rRNA regions, they were the least abundant. The mononucleotide repeat analysis results were presented in six species (E. przewalskii, E. monosperma, E. intermedia, E. equisetina, E. foeminea, E. sinica): the highest number was poly A/T, ranging from 40 to 47, and the lowest was poly C/G, varying from one to three. Only one di-nucleotide (AT/TA) and trinucleotide (ATA/TTA), five tetra-nucleotide (AGGT/ATTG, CAAA/TTCT, ATAA/ATCT, ATAG/AATA, and CTAC/CTAT), two pentanucleotide (TTTTA/TTTTC, ATAAA/AAGAA) were discovered in each cp genomes, while ATTTC was merely in E. przewalskii ( Figure S2). Also, we conducted the long repeats analysis that detected 125 non-overlapped repeats (54 palindromic, 34 forward, 19 complement, and 18 reverse repeats) in six Ephedra cp genomes (Figure 3). The palindromic repeats were the most common in these genomes, and the number varies from seven to eight. No reverse repeat was detected in E. przewalskii, whereas seven reverse repeats were found in E. monosperma. All four types of repeats were more abundant in the LSC region than in the SSC and IR regions. Moreover, we identified the length statistics of long repeat sequences in different size ranges for these cp genomes ( Figure S3).

Codon Usage
We used 73 shared PCGs to analyze the codon usage bias and to calculate the relative synonymous codon usage (RSCU) value by codonW in all six cp genomes (Figure 4). There were twenty Amino acids and 64 codons, including three stop codons UAA, UAG, and UGA. There was only one codon in Methionine (Met) and Tryptophan (Trp). This study illustrated that there were 22,897 codons in E. przewalskii, 27,414 codons in E. monosperma, 27,631 codons in E. intermedia, 27,625 codons in E. equisetina, 27,620 codons in E. foeminea, and 27,623 codons in E. sinica encoded 73 PCGs, respectively. Moreover, in the six cp genomes, Leucine (Leu) was the most frequent amino acid with the codon number ranging from 2038 to 3603. Trp was the least frequent amino acid with the codon numbers varying from 371 to 490. As in Figure 4, the results of RSCU values indicated slight differences among E. przewalskii and E. monosperma. The RSCU value of 30 codons was more significant than 1 with A/T endings. Other codons were less than 1 with G/C endings, and the value was equal to 1 with only one codon.
reverse repeats; C: complement repeats; F: forward repeats; P: palindromic repeats. The num four types of long repeats were shown in different colors.

Codon Usage
We used 73 shared PCGs to analyze the codon usage bias and to calculate the re synonymous codon usage (RSCU) value by codonW in all six cp genomes (Figure 4 Moreover, in the s genomes, Leucine (Leu) was the most frequent amino acid with the codon nu ranging from 2038 to 3603. Trp was the least frequent amino acid with the codon num varying from 371 to 490. As in Figure 4, the results of RSCU values indicated differences among E. przewalskii and E. monosperma. The RSCU value of 30 codon more significant than 1 with A/T endings. Other codons were less than 1 with endings, and the value was equal to 1 with only one codon.

Divergence in Six Ephedra Chloroplast Genome
Results of mVISTA revealed, that non-coding regions were less conserved than proteincoding regions ( Figure 5). The LSC and SSC regions were more divergent than the IRs regions, and the rRNA gene was highly conserved. trnH gene only appeared in the IR region of E. foeminea. The ycf1 gene spanned thr the IRa/SSC junction regions and ranged from 6053 to 6062 bp in length. This extended by 17 bp of the same length into the IRa regions in five Ephedra species. D the contraction of the ycf1 gene in E. intermedia, the length was shorter than its w length (6056 bp)  Also, we compared the boundaries of LSC, SSC, and IR among six cp genomes with IRscope tools. The result exhibited that six Ephedra species had little variations of IR/LSC and IR/SSC junction position and characteristics ( Figure 6). In five Ephedra species, the rpl2 gene entirely existed in the LSC region. It was 53 to 149 bp away from the LSC/IRb junction regions, except that E. foeminea had an extremely short length of the rpl2 gene. Three genes (trnI, rps15, and chlN) were entirely situated at IR (IRa, IRb) regions, and the rps15 gene was 80 to 88 bp away from the IRb/SSC junction regions. The psbA gene was located in the IRa region of the other Ephedra species, but it moved to the LSC region in E. foeminea. The trnH gene only appeared in the IR region of E. foeminea. The ycf1 gene spanned through the IRa/SSC junction regions and ranged from 6053 to 6062 bp in length. This gene extended by 17 bp of the same length into the IRa regions in five Ephedra species. Due to the contraction of the ycf1 gene in E. intermedia, the length was shorter than its whole length (6056 bp) Figure 6. Comparisons of LSC, SSC, and IR region junctions among the six Ephedra cp genomes. The numbers above genes represent the distance between the gene and the junction.

Evolutionary Rates in Protein-Coding Genes of Ephedra species
We used 68 PCGs from five cp genomes (E. przewalskii, E. monosperma, E. intermedia, Gnetum luofuense, Cycas szechuanensis), with Cycas szechuanensis as the reference. The final results were the average values of non-synonymous nucleotide substitutions (Ka), synonymous nucleotide substitutions (Ks), and Ka/Ks for 68 PCGs, respectively (Table  S4). Among the Ks value of genes, 0.0005 in ycf2 and 0.063 in psbT were the smallest and the largest value, respectively. All PCGs showed a lower Ka value. The smallest Ka value was 0.0002 in psaB, and the largest was 0.007 in rps12 and rps14. The Ka/Ks values of 68 PCGs were less than 1, meaning that the synonymous substitutions rates were higher than the non-synonymous nucleotide substitutions. It exhibited that all of them undergo purifying selection, ranging from 0 to 0.97. The largest Ka/Ks value was in rpoA, and the smallest was in cemA and chlB.

Evolutionary Rates in Protein-Coding Genes of Ephedra species
We used 68 PCGs from five cp genomes (E. przewalskii, E. monosperma, E. intermedia, Gnetum luofuense, Cycas szechuanensis), with Cycas szechuanensis as the reference. The final results were the average values of non-synonymous nucleotide substitutions (Ka), synonymous nucleotide substitutions (Ks), and Ka/Ks for 68 PCGs, respectively (Table S4). Among the Ks value of genes, 0.0005 in ycf2 and 0.063 in psbT were the smallest and the largest value, respectively. All PCGs showed a lower Ka value. The smallest Ka value was 0.0002 in psaB, and the largest was 0.007 in rps12 and rps14. The Ka/Ks values of 68 PCGs were less than 1, meaning that the synonymous substitutions rates were higher than the non-synonymous nucleotide substitutions. It exhibited that all of them undergo purifying selection, ranging from 0 to 0.97. The largest Ka/Ks value was in rpoA, and the smallest was in cemA and chlB.

Predicted RNA Editing Sites for E. przewalskii and E. monosperma
We predicted the RNA editing sites for E. przewalskii and E. monosperma (Figure 7; Table S5). We found 57, and 56 predicted RNA editing sites in the 15 PCGs of E. przewalskii and E. monosperma, respectively. Among these PCGs, the RNA Polymerase group had the highest number of predicted RNA editing sites. Specifically, the genes rpoB, rpoC2, rpoC1, and rpoA possessed fourteen, eleven, nine, and two RNA editing sites, respectively. All predicted editing sites were C to U transitions, and the most frequent amino acid conversions were from proline to serine. ity 2022, 14, 792 12 predicted editing sites were C to U transitions, and the most frequent amino conversions were from proline to serine.

Phylogenetic Inference
We inferred the phylogenetic relationship of 14 cp genomes sequences of Ephedra observed the same tree topology in maximum likelihood (ML) and Bayesian inference analysis (Table S2; Figure 8). The maximum likelihood of bootstrap support (MLBS) bayesian posterior probability (BPP) were high for each lineage.

Genome Feature in Ephedra
With the advancement of sequencing technology and cost reduction, chloroplast genome research has been continuously growing, with an increasing number of chloroplast genomes published in public databases, deepening people's understanding and knowledge of plastid genomes [51]. These abundant genomic data paved the way for plant phylogenetic analysis [52,53]. The chloroplast genome had many advantages, such as maternal inheritance, highly conserved, abundant gene composition, etc. [52]. It has evolved into one of the most effective phylogenetic studies and molecular taxonomy tools [52]. The genomic information was extremely valuable in terms of species origin, evolution, and species relationships, and it has solved numerous phylogenetic problems [54,55]. As a traditional Chinese herbal medicine, Ephedra had crucial medicinal value [56]. We revealed the phylogenetic relationship between ten Ephedra species based on shared chloroplast PCGs, providing valuable phylogenetic information for this genus. At the same time, conducting comparative genomics and evolutionary analysis can provide abundant data from the plastid genome for species identification of Ephedra.
High AT base content found in the cp genomes of six Ephedra species was also reported in gymnosperms [25]. The GC content ranged from 36.6% to 36.7%, with that of

Genome Feature in Ephedra
With the advancement of sequencing technology and cost reduction, chloroplast genome research has been continuously growing, with an increasing number of chloroplast genomes published in public databases, deepening people's understanding and knowledge of plastid genomes [51]. These abundant genomic data paved the way for plant phylogenetic analysis [52,53]. The chloroplast genome had many advantages, such as maternal inheritance, highly conserved, abundant gene composition, etc. [52]. It has evolved into one of the most effective phylogenetic studies and molecular taxonomy tools [52]. The genomic information was extremely valuable in terms of species origin, evolution, and species relationships, and it has solved numerous phylogenetic problems [54,55]. As a traditional Chinese herbal medicine, Ephedra had crucial medicinal value [56]. We revealed the phylogenetic relationship between ten Ephedra species based on shared chloroplast PCGs, providing valuable phylogenetic information for this genus. At the same time, conducting comparative genomics and evolutionary analysis can provide abundant data from the plastid genome for species identification of Ephedra.
High AT base content found in the cp genomes of six Ephedra species was also reported in gymnosperms [25]. The GC content ranged from 36.6% to 36.7%, with that of IR regions significantly higher than that of the SC regions in the six species (Table 1). There were several examples consistent with previous studies [53,57,58]. Gene loss, duplication, and transfer between chloroplast and nuclear genomes, reliable sources of evolution as they are, also occurred in this study [59]. These events also occurred in this study. It may offer helpful information on evolutionary studies for Ephedra and whole gymnosperm plants.
Ndh (NADH dehydrogenase subunit) encoded the subunits of the proton-pumping NADH and played a significant role in green plants [60]. The Ephedra species in this study had lost all the ndh family genes, consistent with the result of the same genus and related genus, Welwitschia [25,26]. Also, we believed that there might be two reasons for the loss of ndh genes in the Ephedra species. First, this type of gene did not play any role in the evolution of Ephedra and was eliminated after selection. Secondly, the absence of ndh genes in some gymnosperms, including Pinaceae and Gnetales, was related to the living conditions in the Mesozoic era, including high temperature and carbon dioxide concentration [61][62][63]. However, there was no relevant evidence to prove that the missing ndh functional genes might be transferred to the nucleus to become a nuclear gene due to the high mutation rate. The absence of genes did not appear to affect the photosynthetic function of green plants [64,65]. Evidence has shown that Welwitschia plants mainly rely on glyceric acid to metabolize CAM for photosynthesis due to the loss event of the ndh genes [60]. Therefore, we assumed that Ephedra also depends on this metabolism to replace the function of the ndh genes. Losing genes can be a standard feature of the Ephedra species to clarify its phylogenetic relationship. Further studies with increasing samples are necessary to provide more information for the evolutionary analysis of Ephedra. The gene matK was one of the fastest evolving genes in the cp genome and located in the intron of trnK-UUU [66]. This gene was broadly used in phylogenetic studies within families, and inter-genera [67][68][69][70]. This gene had a 0.751 Ka/Ks value in this study, indicating it experienced purifying selection.

Comparison of Genomes for Ephedra
Six cp genomes were carefully compared to determine their identity and divergence. In mVISTA, the plot could visualize the sequence identity of six Ephedra species (Figure 2). The coding region of six cp genomes was more conservative than the non-coding region, congruent with other studies [71]. We found 31 non-coding highly variable regions, and nine PCGs were also highly variable. These highly variable regions and genes might be molecular markers for Ephedra species identification and population genetic study. There were subtle divergences in the LSC/IRS/SSC boundary gene distribution in the six Ephedra species. The psbA gene moved to the LSC region in E. foeminea, its length was longer than other species. This gene was a complete gene in E. foeminea. Compared with other Ephedra species, E. foeminea's IR region and rpl2 gene in length were shorter, with its IR region showing noticeable contraction changes. The rpl2 gene was a pseudogene in E. foeminea. The ycf1 gene was entirely located in the SSC region of E. intermedia, it was a pseudogene in this species. These results can be used as one of the features to distinguish Ephedra species.
Nucleotide diversity was a proposed measure, to express the degree of nucleotide polymorphism in a population [72]. We analyzed the sequence variation of six complete cp genomes. The IR regions were found to be more conserved than the SC regions. The nucleotide polymorphisms in the SC region were greater than that in the IR region. The sequence of the ccsA gene with the highest Pi value (Pi = 0.018) was located in the SSC region. The trnF-GAA_trnfM-CAU intergenic spacer showed the highest variation (Pi = 0.01389) and was located in the LSC region.
SSRs are highly variable genetic markers and could be used for species identification, population genetics analyses, or evolutionary biology studies [73,74]. Our SSRs analysis revealed that the single-base repeat type (A/T) variation was the most abundant, implying that there were more replications in six Ephedra species. SSRs were mainly found in the LSC and the IGS, while fewer SSRs were situated in the IR region. This result has been observed in other plants [75]. These results may be used to study genetic diversity for Ephedra species. The number and distribution of the four types of long repeats differed lightly between the six Ephedra species. All species had complement, forward, and palindromic repeats. The reverse repeat was also identified in all Ephedra species except for E. przewalskii and E. sinica.

Codon Usage Bias Analysis
Understanding codon usage bias might reveal the effects of long-term evolution on the plant genome [76]. Due to the combined effects of gene selection, mutation, and drift in the long-term evolution process, most species had various Codon Usage Bias [77]. Interestingly, we found that codon usage preference mostly ends with AT, consistent with the determination result of AT-rich base content. CAI could estimate gene expression levels as an essential indicator of species codon usage preference [78,79]. These values were all-around 0.175, indicating a relatively low codon usage preference. The parameter ENc could quantify codon usage bias with a greater than 46 value in Ephedra species, indicating that codon preference was weak [80]. Similar results were observed for other species [81]. In general, these indicators analysis found that codon usage preferences of Ephedra species were not strong. Still, this study was limited to these few parameters, which were insufficient to explain the preference strength of Ephedra species. Therefore, the sample of Ephedra should be increased for a more specific analysis of codons usage bias in future studies.
Since the discovery of the first RNA editing event in trypanosome mitochondria [82], many studies on RNA editing have been published [83,84]. RNA editing events also existed in plants, which affect plants' growth, development, and response to various stresses [85][86][87][88]. For most of the chloroplast genomes of gymnosperms and angiosperms, only a few dozen RNA editing sites could be predicted [89]. Similarly, we observed only 56-57 RNA editing sites in the cp genomes of E. przewalskii and E. monosperma. The most frequent type of RNA editing was from C to U conversion in a variety of plants such as thale cress (Arabidopsis thaliana) [90], grape (Vitis vinifera) [91], tobacco (Nicotiana tabacum) [92], as well as this two Ephedra species. Our study indicated that the most frequent amino acid conversion was proline to serine in the two Ephedra species, similar to the results in other studies [56,59]. Moreover, all the DNA-dependent RNA polymerase genes involved the most predicted RNA editing sites. These results are published for the first time and can offer a novel insight into future RNA editing studies for E. przewalskii and E. monosperma.

Evolution Analysis
Analysis of the non-synonymous and synonymous substitutions ratio has become a significant element of molecular evolution studies [93]. It was widely utilized to determine selection pressure for PCGs [94]. The Ka/Ks value of 68 PCGs was less than 1, meaning non-synonymous substitutions were less than synonymous substitutions. This often results in harmful traits for non-synonymous substitutions that would face elimination [95]. These PCGs were subject to purifying selection, similar to that in other studies [27,96,97]. The DNA-dependent RNA polymerase function gene group of E. przewalskii and E. monosperma showed a high Ka/Ks value, especially the rpoA with the highest value (0.97). This illustrated that these genes had a faster evolution rate than other functional groups of genes.

Phylogenetic Analysis
Many scholars have explored the phylogenetic relationship of Ephedra, mainly based on RAPD markers, nrDNA, and cpDNA sequences [11,16,17,98]. This study also utilized the concatenated sequences of chloroplast PCGs to construct phylogenetic trees based on different methods (ML/BI), with high bootstrap value and Bayesian posterior probability. All Ephedra species clustered together in a well-supported monophyletic clade, congruing with previous studies [19,20]. In our study, E. przewalskii and E. monosperma did not cluster together, consistent with the previously reported phylogenetic trees based on molecular data [18,20,99]. Moreover, from the phylogenetic tree's topological structure, it could be observed that Ephedra species from different countries are nested together. The Ephedra species (E. californica, E. altissima, E. alata) from North America, Africa, and Kazakhstan were all nested in the sister branches of China species (E. foeminea), consistent with previous findings [99]. In summary, we reported the phylogenetic relationship of the ten Ephedra species based on the concatenated sequences of PCGs. It offers valuable information for the phylogeny of the genus Ephedra.

Conclusions
Our study reported the complete chloroplast genome of E. przewalskii and E. monosperma. We also revealed the slight differences in cp genome characterization, the number of long repeats and SSRs, codon usage bias, and RNA editing sites. Moreover, all PCGs in Ephedra species were affected by purifying selection. Also, we revealed the phylogenetic relationship of ten Ephedra species with high maximum likelihood bootstrap support and Bayesian posterior probability. These data provide valuable and abundant information for the phylogenetic analysis and species identification of the medical plant, Ephedra.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/d14100792/s1. Figure S1: The number of short sequence repeats (SSRs) in LSC, SSC, IR, CDS, rRNA, tRNA, and IGS in the cp genomes of six Ephedra species; Figure S2: The frequency of each SSR in the cp genomes of six Ephedra species. The order of six columns of each SSR is E. przewalskii, E. monosperma, E. intermedia, E. equisetina, E. foeminea, E. sinica, respectively; Figure S3: The number of four types of long repeats with different lengths ranges in the cp genomes of six Ephedra species; R: reverse repeats; C: complement repeats; F: forward repeats; P: palindromic repeats. Repeats with different lengths for six cp genomes are shown in different colors; Figure S4: The results of the nucleotide variability (Pi) values in the six Ephedra species. The x-axis is the position of the midpoint of the window. The y-axis is the nucleotide diversity (Pi) values of each window; Table S1: Specimen information of E. przewalskii and E. monosperma; Table S2: The cp genome sequences used in the phylogenetic analysis; Table S3: Codon usage of protein-coding genes in chloroplast genomes of six Ephedra species (Unit: %); Table S4: The Ka/Ks ratio of 68 protein-coding genes of five cp genomes (E. przewalskii, E. monosperma, E. intermedia, Gnetum luofuense, Cycas szechuanensis), with C. szechuanensis as the reference; Table S5: The number of predicted RNA editing sites in the cp genomes of E. przewalskii and E. monosperma.

Institutional Review Board Statement: Not applicable.
Data Availability Statement: All data generated or analyzed during this study are included in this published article.