Complete Chloroplast Genome Sequences of Two Ehretia Trees ( Ehretia cymosa and Ehretia obtusifolia ): Genome Structures and Phylogenetic Analysis

: Ehretiaceae is a family in the order Boraginales. It contains more than 150 species. The Ehretiaceae classiﬁcation has remained elusive and changed over time from subfamily to family, or vice versa. In this paper, we sequenced, characterized, and analyzed the complete chloroplast (cp) genomes of Ehretia cymosa and Ehretia obtusifolia , and their cp genomes were compared to those of related species. The length of the chloroplast genomes of E. cymosa was 156,328 bp, whereas that of E. obtusifolia was 155,961 bp. Each genome contained 114 genes, including 80 protein-coding genes, 4 rRNA genes, and 30 tRNA genes. Repeat analysis revealed that complement, forward, palindromic, and reverse repeats were present in the chloroplast genomes of both species. Simple sequence repeat analysis showed that the chloroplast genomes of E. cymosa and E. obtusifolia comprise 141 and 139 microsatellites, respectively. Phylogenetic analysis based on Bayesian and maximum likelihood analyses divided the order Boraginales into two well-supported clades. The ﬁrst clade includes a single family (Boraginaceae), and the second clade includes three families (Ehretiaceae, Cordiaceae, and Heliotropiaceae). This study provides valuable genomic resources and insights into the evolutionary relationships within Boraginales.


Introduction
The Ehretiaceae (Ehretioideae) is a family of the flowering plant order Boraginales. The family, as most recently circumscribed, contains seven genera (Bourreria, Cortesia, Ehretia, Halgania, Lepidocordia, Rochefortia, and Tiquilia) and comprises more than 150 species widely spread in tropical and subtropical regions [1]. The Ehretiaceae members are mostly trees with the following characteristics: leaves are entire and alternate in arrangement; inflorescence is terminal or axillary; flowers are 5-merous; bisexual or unisexual; corolla is white, blue, or red; shape is tubular, campanulate, or rotate; five stamens, ovary in a slender terminal style, and two stigmas divided slightly or deeply; four ovules in two or four locules; fruit drupaceous, dry or fleshy [1][2][3].
Genetic information allows researchers to determine the evolutionary relationships among organisms. The chloroplast (cp) genome contains functional genes that are essential Genetic information allows researchers to determine the evolutionary relationships among organisms. The chloroplast (cp) genome contains functional genes that are essential to plant cells, and these genes offer valuable genetic data for comparative studies of the evolutionary relationships between plants [23]. A chloroplast is an organelle inside a plant cell that uses the photosynthetic process to transform light energy into chemical energy [24]. The structure, arrangement, and content of genes in the chloroplast genomes of angiosperm species are remarkably conserved [25]. The cp genome of flowering plant species has a circular quadripartite structure, rarely with multibranched linear structures [26]. The cp genome comprises two inverted repeat regions (IRs): a large single-copy region (LSC) and a small single-copy region (SSC) [27]. More than 5998 cp genome sequences have been reported in the National Center for Biotechnology Information (NCBI) database, demonstrating the widespread use of cp genome sequencing in plant phylogenetic research [28]. In comparison to using a few genes, the complete chloroplast genome can provide more accurate answers regarding evolutionary relationships [29].
To date, only four chloroplast genome sequences of the Ehretiaceae family (Ehretia acuminata, Ehretia. dicksonii, Ehretia. longiflora, and Tiquilia plicata) have been reported in the GenBank database. In this study, we sequenced the cp genomes of two species, namely Ehretia cymosa and Ehretia obtusifolia ( Figure 1). The cp genome sequences of five Ehretia species, eight species from different Boraginales families, and two outgroup species from Gentianales and Lamiales were compared to observe the sequence variation and to understand the evolutionary relationships between the Ehretiaceae family and other families in the order. The analyses also provided valuable details about the features of the genomes, including their GC content, long and simple sequence repeats, RNA editing sites, utilization of codons, and IR junctions. The main goals of this study were to characterize and analyze the complete chloroplast genomes of E. cymosa and E. obtusifolia and provide insight into the phylogenetic relationships of Ehretiaceae at the family level.   . Specimens were identified using morphological approaches. Total genomic DNA was extracted from leaves using the DNeasy Plant Mini Kit, and the genomic DNA's quality and concentration were assessed using a Qubit fluorometer and agarose gel electrophoresis.

Sequencing and Assembly
Library construction and sequencing were carried out at BGI Genomics Company in Hong Kong using the DNBseq platform; the raw data were filtered by removing contamination, low-quality reads, and adaptor sequences using SOAPnuke v.2.1.7 software [30] to obtain clean data (10 GB) with 150 bp pair-end reads. Genome assembly was performed using NOVOPlasty v.4.3.1 [31]; the complete chloroplast genome sequence of E. dicksonii (MZ555766) was used as the reference genome to assemble the E. cymosa and E. obtusifolia chloroplast genomes. Finally, a circular contig comprising the complete cp genome sequence was generated for each species.

Gene Annotation
Annotation and gene prediction of complete chloroplast genomes were performed using GeSeq [32] and corrected manually using Sequin 15.5 "http://www.ncbi.nlm.nih. gov/Sequin/ (accessed on 12 February 2023)". The circular map of the cp genome was drawn using OGDRAW 1.3.1 [33]. Finally, the results of the cp genome sequences were submitted to GenBank with the following accession numbers: E. cymosa (OP679792) and E. obtusifolia (OQ730227).

Codon Usage and RNA Editing Sites
MEGA 6.0 [34] was used to analyze the sequences and determine the base composition, relative synonymous codon, and codon usage. The RNA editing sites in the protein-coding genes of E. cymosa and E. obtusifolia were predicted using the PREPACT 3.0 tool [35]. The prediction was performed on the BLASTX analysis mode using Arabidopsis thaliana (NC_000932.1) and Pisum sativum (NC_014057.1) as reference sequences, with the cutoff E-value set to 0.8.

Repeat Analysis of Chloroplast Genomes
The long repeats (complement, forward, palindromic, and reverse) were detected using REPuter v.2 software [36]. The minimal repeat size was set at 15 bp and the identified similarity between the repeat sequences was more than 90%. Using MISA v.2.1 software [37], simple sequence repeats (SSRs) were detected. The parameters used were 8, 5, 4, 3, 3, and 3, to identify mon, di, tri, tetra, penta, and hexa microsatellite repeats, respectively.

Characterization of Substitution Rate
The protein-coding sequences were separately aligned from the complete chloroplast sequences of E. cymosa and E. obtusifolia using Geneious software 2023.0.4 [38]. DNAsp v5 software [39] was used to determine the nonsynonymous (dN) and synonymous (dS) substitution rates and to reveal the genes that were under selective pressure.

Genome Comparison
The chloroplast genomes of E. cymosa and E. obtusifolia were analyzed and compared with those of the other Ehretia species available in the GenBank database: E. acuminata (MW801108.1), E. dicksonii (MZ555766.1), and E. longiflora (MW801239.1), using the mVISTA alignment program [40] in Shuffle-LAGAN mode. The cp genome of E. cymosa was set as the reference. Comparing and visualizing the boundaries of the LSC, SSC, and IR junction sites among the five Ehretia species was performed using the IRscope tool [41]. Although the chloroplast genomes of E. acuminata and E. longiflora were available in the GenBank, both genomes were in unverified status (lack annotation). Therefore, we performed the annotation and gene prediction of both genomes to use in our analyses.

Phylogenetic Analysis
A comparative analysis was performed using the complete chloroplast genome sequences of five Ehretia species (E. acuminata, E. cymosa, E. dicksonii, E. longiflora, and E. obtusifolia), eight taxa representing three families belonging to the Boragianles order (Boraginaceae, Cordiaceae, and Heliotropiaceae), and two species from the Gentianaceae and Lamiaceae families, which were used as outgroups. The MAFFT v.7.520 software [42] was used (default settings) to align all the sequences. The phylogenetic trees were constructed based on two analyses: Bayesian inference (BI) using MrBayes v.3.2.7 [43] and maximum likelihood (ML) using IQ-TREE version v.2.2.2.6 [44]. First, BI analysis was carried out using the following settings: run for 1,000,000 generations, printing and sampling every 500 generations, and the best substitution model (GTR + G), which was selected using jModelTest version 3.7 [45]. Second, ML analysis was carried out using the following settings: 10,000 ultra-fast bootstrap (UFBOOT) replicates and the best substitution model (TVM + F + I + G4), which was selected using ModelFinder [46].

Characteristics of E. cymosa and E. obtusifolia
The complete chloroplast genomes of E. cymosa and E. obtusifolia were found to be 156,328 bp and 155,961 bp in size, respectively, and they had a circular and quadripartite structure (Table 1 and Figure 2). The cp genomes of E. cymosa and E. obtusifolia consisted of the LSC region with a length of 86,624 bp and 86,211 bp, respectively; the SSC region with a length of 18,142 bp and 18,154 bp, respectively; and a pair of the IR regions with a length of 25,781 bp and 25,798 bp, respectively (Table 1). Overall, the GC content of E. cymosa was determined to be 37.86%, whereas the GC content of E. obtusifolia was 37.87%. Moreover, the IR regions had a higher GC content, ranging from 43.17% in E. cymosa to 43.18% in E. obtusifolia. The LSC regions had GC contents of 35.91% in both genomes. The SSC regions had the lowest GC content, ranging from 32.15% in E. cymosa to 32.01% in E. obtusifolia (Table 1). SSC region comprised 12 protein-coding genes and 1 tRNA gene; and the IR regions comprised 8 protein-coding genes, 4 rRNA genes, and 7 tRNA genes. Introns were found in some of the tRNA and protein-coding genes of both genomes. A total of 18 of the 114 unique genes comprised introns. In this regard, 6 were tRNA genes and 12 were protein-coding genes, while 16 genes had 1 intron and 2 genes (clpP1 and ycf3) had 2 introns (Table S2). The longest intron was present in the trnK-UUU gene, where it was 2469 bp in length in E. cymosa and 2475 bp in length in E. obtusifolia (Table S2). In the inner map, the brightly grey region refers to the AT contents, while the dark grey region refers to the GC content. The colored bars indicate functional genes. Asterisk symbol (*) refers to the genes with introns. The SSC and LSC represent the small and large single-copy regions. The IR represents inverted repeat regions.

Codon Usage
The protein-coding and tRNA sequences of E. cymosa and E. obtusifolia were used to determine the frequency of codon usage in both species. The relevant sequence lengths In the inner map, the brightly grey region refers to the AT contents, while the dark grey region refers to the GC content. The colored bars indicate functional genes. Asterisk symbol (*) refers to the genes with introns. The SSC and LSC represent the small and large single-copy regions. The IR represents inverted repeat regions.
In addition, the cp genomes of E. cymosa and E. obtusifolia comprised a total of 134 genes. The number of unique genes was 114, 19 of which were duplicated in the IR regions; the rps12 gene was present in the LSC region as well as duplicated in the IR regions (Table S1). In both genomes, there were 80 protein-coding genes, 4 rRNA genes, and 30 tRNA genes. More specifically, the LSC region comprised 60 protein-coding genes and 22 tRNA genes; the SSC region comprised 12 protein-coding genes and 1 tRNA gene; and the IR regions comprised 8 protein-coding genes, 4 rRNA genes, and 7 tRNA genes. Introns were found in some of the tRNA and protein-coding genes of both genomes. A total of 18 of the 114 unique genes comprised introns. In this regard, 6 were tRNA genes and 12 were protein-coding genes, while 16 genes had 1 intron and 2 genes (clpP1 and ycf3) had 2 introns (Table S2). The longest intron was present in the trnK-UUU gene, where it was 2469 bp in length in E. cymosa and 2475 bp in length in E. obtusifolia (Table S2).

Codon Usage
The protein-coding and tRNA sequences of E. cymosa and E. obtusifolia were used to determine the frequency of codon usage in both species. The relevant sequence lengths were 82,542 bp in E. cymosa and 82,181 bp in E. obtusifolia. The cp genome of E. cymosa included 27,513 codons, with leucine (11.11%) being the most common and tryptophan (2.02%) the least common ( Figure 3). Similarly, the cp genome of E. obtusifolia featured

RNA Editing Sites
The RNA editing sites (C-U editing) in E. cymosa and E. obtusifolia chloroplast genomes were predicted using the PREPACT tool. A total of 31 RNA editing sites were predicted in each genome, distributed across 16 protein-coding genes. The ndhB gene had the most RNA editing sites (nine), followed by the ndhD and rpoB genes (four each), while the remaining genes had one or two editing sites (matK, atpF, rps2, psbZ, rps14, accD, psbE, petB, rpoA, rpl23, ndhF, ndhG, and ndhA) ( Figure 4 and Table S5). In both species, 90.32% of the editing sites were found in the second position of the triplet codon, while 9.68% appeared in the first position of the triplet codon. The analysis also revealed that serine to leucine and proline to leucine were the most common amino acid conversions.

RNA Editing Sites
The RNA editing sites (C-U editing) in E. cymosa and E. obtusifolia chloroplast genomes were predicted using the PREPACT tool. A total of 31 RNA editing sites were predicted in each genome, distributed across 16 protein-coding genes. The ndhB gene had the most RNA editing sites (nine), followed by the ndhD and rpoB genes (four each), while the remaining genes had one or two editing sites (matK, atpF, rps2, psbZ, rps14, accD, psbE, petB, rpoA, rpl23, ndhF, ndhG, and ndhA) ( Figure 4 and Table S5). In both species, 90.32% of the editing sites were found in the second position of the triplet codon, while 9.68% appeared in the first position of the triplet codon. The analysis also revealed that serine to leucine and proline to leucine were the most common amino acid conversions.

Long Repeats
The long repeat sequences in the E. cymosa and E. obtusifolia chloroplast genomes were identified using the REPuter program. The results revealed that both genomes contained all four types of long repeats (complement, forward, palindromic, and reverse),

Long Repeats
The long repeat sequences in the E. cymosa and E. obtusifolia chloroplast genomes were identified using the REPuter program. The results revealed that both genomes contained all four types of long repeats (complement, forward, palindromic, and reverse), with 47 repeats found in E. cymosa and 49 repeats in E. obtusifolia. More specifically, the analysis of the E. cymosa and E. obtusifolia cp genomes identified 2 and 1 complementary repeats, respectively; 20 and 21 palindromic repeats, respectively; 8 and 10 reverse repeats, respectively; and 17 forward repeats in each genome ( Figure 5 and Tables S6 and S7). The majority of the repeats in E. cymosa were between 18 bp and 24 bp in size (82.97%), followed by those between 26 bp and 29 bp (12.76%), and between 41 bp and 44 bp (4.25%). In E. obtusifolia, the majority of the repeats were between 18 bp and 24 bp in size (85.10%), followed by those between 26 bp and 32 bp (14.28%), and between 41 bp and 44 bp (4.08%).

Long Repeats
The long repeat sequences in the E. cymosa and E. obtusifolia chloroplast genomes were identified using the REPuter program. The results revealed that both genomes contained all four types of long repeats (complement, forward, palindromic, and reverse), with 47 repeats found in E. cymosa and 49 repeats in E. obtusifolia. More specifically, the analysis of the E. cymosa and E. obtusifolia cp genomes identified 2 and 1 complementary repeats, respectively; 20 and 21 palindromic repeats, respectively; 8 and 10 reverse repeats, respectively; and 17 forward repeats in each genome ( Figure 5 and Tables S6 and S7). The majority of the repeats in E. cymosa were between 18 bp and 24 bp in size (82.97%), followed by those between 26 bp and 29 bp (12.76%), and between 41 bp and 44 bp (4.25%). In E. obtusifolia, the majority of the repeats were between 18 bp and 24 bp in size (85.10%), followed by those between 26 bp and 32 bp (14.28%), and between 41 bp and 44 bp (4.08%). The intergenic spacer regions in E. cymosa and E. obtusifolia harbored 48.93% and 52.04% of the repeats, respectively. The protein-coding genes contained 34.04% of the repeats in E. cymosa and 30.61% of the repeats in E. obtusifolia, whereas the tRNA genes contained 17.03% of the repeats in E. cymosa and 17.35% of the repeats in E. obtusifolia (Tables S6 and S7). In addition, we compared the results concerning the long repeat types between E. cymosa and E. obtusifolia, and the other Ehretia species available in the GenBank database (E. acuminata, E. dicksonii, and E. longiflora). The analysis revealed the absence of the complementary repeat type in all the species, except for E. cymosa and E. obtusifolia ( Figure 5). Moreover, the palindromic repeat was found to be the most common repeat type in all the taxa except for E. dicksonii, in which the forward repeat was the most common type ( Figure 5).

Simple Sequence Repeats
Simple sequence repeats (SSRs), which are also referred to as microsatellites, were found to be spread throughout both genomes. Indeed, the cp genomes of E. cymosa and E. obtusifolia comprised 141 SSRs and 139 SSRs, respectively (Tables S8 and S9). In the cp genome of E. cymosa, most of the SSRs were mononucleotides (93.61%), with the highest frequency (98.48%) of A/T motif, followed by C/G (1.52%) ( Table 2). In addi- tion, the cp genome of E. cymosa contained one dinucleotide (AT/AT), one trinucleotide (AAG/CTT), two tetranucleotides (AAAC/GTTT and AAAT/ATTT), and one pentanucleotide (AATCC/ATTGG). Similarly, in the cp genome of E. obtusifolia, most of the SSRs were mononucleotides (93.52%), with the highest frequency (98.45%) of A/T motif, followed by C/G (1.55%) ( Table 2). The cp genome of E. obtusifolia also contained one dinucleotide (AT/AT), one trinucleotide (AAG/CTT), two tetranucleotides (AAAC/GTTT and AAAT/ATTT), and one pentanucleotide (AATCC/ATTGG). A comparative analysis of the SSR types was performed using the other Ehretia species available in the GenBank database (E. acuminata, E. dicksonii, and E. longiflora). The results showed that the SSR types ranged from mononucleotide to pentanucleotide repeats ( Figure 6). In this regard, mononucleotide, dinucleotide, trinucleotide, and tetranucleotide repeats were detected in all the genomes, whereas pentanucleotide repeats were absent from E. acuminata and E. dicksonii ( Figure 6).

Comparative Analysis
The IR/LSC and IR/SSC borders in the chloroplast genomes of E. cymosa and E. obtusifolia were compared with those of the other Ehretia species available in the GenBank database (E. acuminata, E. dicksonii, and E. longiflora). The results revealed similarities between the cp genomes of the five species (Figure 7). E. longiflora had the largest cp genome (156,802 bp), followed by E. dicksonii (156,623 bp)

Comparative Analysis
The IR/LSC and IR/SSC borders in the chloroplast genomes of E. cymosa and E. obtusifolia were compared with those of the other Ehretia species available in the Gen-Bank database (E. acuminata, E. dicksonii, and E. longiflora). The results revealed similarities between the cp genomes of the five species (Figure 7). E. longiflora had the largest cp genome (156,802 bp), followed by E. dicksonii (156,623 bp) (Figure 7).

Comparative Analysis
The IR/LSC and IR/SSC borders in the chloroplast genomes of E. cymosa and E. obtusifolia were compared with those of the other Ehretia species available in the GenBank database (E. acuminata, E. dicksonii, and E. longiflora). The results revealed similarities between the cp genomes of the five species (Figure 7). E. longiflora had the largest cp genome (156,802 bp), followed by E. dicksonii (156,623 bp) (Figure 7).   (Figure 7). No genes were located at the boundary of IRa/LSC. trnH and psbA genes were found entirely within the LSC region in both species.

Divergence of Protein-Coding Gene Sequence
To identify the sequence divergence regions, the five Ehretia chloroplast genomes were compared using the E. cymosa genome as a reference (Figure 8). The results showed that all genomes were highly conserved, although a number of variable regions were also identified. More variations were observed in the non-coding regions than in the coding regions, while the majority of the divergences were found in the LSC regions ( Figure 8). The psbA, matK, atpA, rpoC2, rpoB, rbcL, ndhD, and ycf1 genes showed the most divergence within the coding regions ( Figure 8). These divergence markers can be used to clarify the evolutionary relationships within Ehretiaceae. that all genomes were highly conserved, although a number of variable regions were also identified. More variations were observed in the non-coding regions than in the coding regions, while the majority of the divergences were found in the LSC regions (Figure 8). The psbA, matK, atpA, rpoC2, rpoB, rbcL, ndhD, and ycf1 genes showed the most divergence within the coding regions (Figure 8). These divergence markers can be used to clarify the evolutionary relationships within Ehretiaceae. Figure 8. Visual alignment of the five Ehretia chloroplast genomes using E. cymosa as a reference. The x-axis refers to the genomic coordinate, whereas the y-axis refers to the identity percentage (50% Figure 8. Visual alignment of the five Ehretia chloroplast genomes using E. cymosa as a reference. The x-axis refers to the genomic coordinate, whereas the y-axis refers to the identity percentage (50% to 100%). The top arrows refer to the direction of each gene. UTR = untranslated region; CNS = conserved non-coding regions. The sequence alignment was conducted using the mVISTA program.

Characterization of the Substitution Rate
The rates of nonsynonymous/synonymous (dN/dS) substitutions were computed within the protein-coding sequences of E. cymosa and E. obtusifolia to evaluate the selective pressure. The results indicated that the dN/dS ratios were <1 for all the genes in E. cymosa vs. E. obtusifolia, except for the ycf3 and ycf15 genes, which had a dN/dS ratio of 1.6 and 1.03, respectively (Figure 9). The dS substitution values of all the genes ranged from 0 to 0.61 (Figure 9).

Characterization of the Substitution Rate
The rates of nonsynonymous/synonymous (dN/dS) substitutions were computed within the protein-coding sequences of E. cymosa and E. obtusifolia to evaluate the selective pressure. The results indicated that the dN/dS ratios were <1 for all the genes in E. cymosa vs. E. obtusifolia, except for the ycf3 and ycf15 genes, which had a dN/dS ratio of 1.6 and 1.03, respectively (Figure 9). The dS substitution values of all the genes ranged from 0 to 0.61 (Figure 9).

Phylogenetic Analysis
The phylogenetic results based on the BI and ML analyses were identical and so are presented here as a single tree with posterior probability (PP) and bootstrap (BS) support values ( Figure 10). The order Boraginales was split into two main clades, namely Boraginales I and Boraginales II, which obtained strong support (PP = 1/BS = 100). First, the Boraginales I clade included only one family, Boraginaceae, consisting of two subfamilies, Boraginoideae and Cynoglossoideae, which received strong support (PP = 1/BS = 100). The subfamily Boraginoideae comprised the genera Aegonychon and Echium, while the subfamily Cynoglossoideae included the genera Lappula and Trigonotis. Second, the Boraginales II clade comprised three families, namely Ehretiaceae, Cordiaceae, and Heliotropiaceae, which received strong support (PP = 1/BS = 100). In addition, Ehretiaceae and Cordiaceae were recovered as sisters, with strong support (PP = 1/BS = 92).

Phylogenetic Analysis
The phylogenetic results based on the BI and ML analyses were identical and so are presented here as a single tree with posterior probability (PP) and bootstrap (BS) support values ( Figure 10). The order Boraginales was split into two main clades, namely Boraginales I and Boraginales II, which obtained strong support (PP = 1/BS = 100). First, the Boraginales I clade included only one family, Boraginaceae, consisting of two subfamilies, Boraginoideae and Cynoglossoideae, which received strong support (PP = 1/BS = 100). The subfamily Boraginoideae comprised the genera Aegonychon and Echium, while the subfamily Cynoglossoideae included the genera Lappula and Trigonotis. Second, the Boraginales II clade comprised three families, namely Ehretiaceae, Cordiaceae, and Heliotropiaceae, which received strong support (PP = 1/BS = 100). In addition, Ehretiaceae and Cordiaceae were recovered as sisters, with strong support (PP = 1/BS = 92).

Discussion
The complete chloroplast genome provides plenty of genetic information, which allows researchers to clarify the complicated evolutionary relationships among plants [47]. In the present study, we report the cp genomes of two species from the Ehretia genus. The cp genomes of E. cymosa and E. obtusifolia were found to be structurally similar to the cp genomes of other Boraginales species [48][49][50]. The cp genome sizes ranged from 156,328

Discussion
The complete chloroplast genome provides plenty of genetic information, which allows researchers to clarify the complicated evolutionary relationships among plants [47]. In the present study, we report the cp genomes of two species from the Ehretia genus. The cp genomes of E. cymosa and E. obtusifolia were found to be structurally similar to the cp genomes of other Boraginales species [48][49][50]. The cp genome sizes ranged from 156,328 bp in E. cymosa to 155,961 bp in E. obtusifolia (Figure 2). The GC of the cp genomes of E. cymosa and E. obtusifolia ranged from 37.86% to 37.87%, respectively, ( Table 1). The GC content was slightly different from that observed in E. dicksonii (39.7%) [51]. The difference in GC content among separate species from the same genus may be due to the fact that various species have different codon use biases. The GC content in the IR regions was 43.17% in E. cymosa and 43.18% in E. obtusifolia, which was higher than the content in the regions of the SSC and LSC, possibly due to the fact that all the rRNAs are present in IR regions [52]. The IR regions may be more stable because of their high GC content in comparison to the LSC and SSC regions [53]. Both genomes consisted of 114 unique genes, which were divided into 80 protein-coding genes, 4 rRNA genes, and 30 tRNA genes (Table S1). In angiosperm cp genomes, intron composition is highly conserved [54], which is important for the control of gene expression [55]. In the E. cymosa and E. obtusifolia cp genomes, introns were identified in 18 genes, 6 of which were tRNA genes and 12 of which were protein-coding genes (Table S2).
The codon usage analysis revealed that the genes in the cp genomes of E. cymosa and E. obtusifolia were encoded by 27,513 and 27,393 codons, respectively. Codon usage plays a crucial role in gene expression [56], resulting in an association with gene expression levels, transcriptional selection, amino acid conservation, and GC content [57]. The majority of the codons were coding for leucine (Figure 3), and the codons in both genomes mostly had an RSCU value of <1. These results were similar to those previously found in relation to E. dicksonii [51]. RNA editing plays a vital role in the cp genome, which involves the alteration of nucleotides in the mRNA of functional genes [58]. The expression of functional proteins is influenced by this mechanism [59]. The RNA editing site analysis identified 31 editing sites in each genome, which were distributed within 16 protein-coding genes ( Figure 4). All base conversions were found in the first and second positions of the triplet codon, resulting in changes in the amino acids, which is consistent with previous studies [54]. The majority of amino acid conversions were from serine to leucine, which is consistent with the characteristics of RNA editing in angiosperm plants [47,60].
The arrangement and recombination of the cp genome may be significantly influenced by the regions and the numbers of the repeat sequences [61]. The long repeat sequence analysis revealed that palindromic and forward repeats were the most common repeats in E. cymosa and E. obtusifolia ( Figure 5), which is consistent with other angiosperm cp genome analyses [62][63][64][65][66]. The SSRs analysis showed that the cp genomes of E. cymosa and E. obtusifolia comprised 141 SSRs and 139 SSRs, respectively ( Table 2). The SSRs have been demonstrated to be important molecular markers in taxonomic research [67]. They have also been utilized in several kinds of studies, including those that analyze gene flow and estimate genetic variation among animal or plant genomes [68,69]. The majority of the SSRs were mononucleotides, among which the A/T repeats were the most common. Most of the SSRs found in angiosperm cp genomes usually contain poly thymine (polyT) or poly adenine (polyA) repeats rather than tandem cytosine (C) and guanine (G) repeats [67,70,71].
The IR/LSC and IR/SSC boundaries of the five Ehretia cp genomes were compared in the present study. The variations in genome length are linked to the contraction and expansion of IR regions [72,73] or gene deletions [74]. The variation in the IR/LSC and IR/SSC borders may respond to a number of phylogenetic signals, such as those in subtle Caryophyllales and Gentianinae species [75,76]. The results showed that genes located in the junctions of Ehretia cp genomes were well conserved: rps19 was found in IRb/LSC regions, ycf1 in IRb/SSC and SSC/IRa regions, ndhF in the SSC region, and trnH in the LSC region (Figure 7). The order of the genes in all regions was similar to that observed in some Boraginales taxa, such as Trigonotis (Boraginaceae s.str) [77]. The analysis of the sequence divergence regions revealed a relatively high diversity within Ehretia cp genomes. As observed in angiosperm cp genomes, genic regions are more conserved than intergenic regions [78][79][80]. However, a number of variable regions were observed in the psbA, matK, atpA, rpoC2, rpoB, rbcL, ndhD, and ycf1 genes (Figure 8). Several of these divergence markers have been used to study the evolutionary relationships among plant species [81,82]. The identification of these highly diverse regions in Ehretia cp genomes would be useful for use as species-specific DNA markers.
Understanding how the rate of substitution affects the modification of gene function and structure requires an analysis of the adaptive evolution of genes. Estimating the dN/dS ratio can provide details about the limitations that natural selection has placed on organisms [83,84]. The selective pressure rate analysis of the 80 protein-coding genes between the E. cymosa and E. obtusifolia cp genomes indicated that the dN/dS ratio was <1 in all the paired genes, except for ycf3 and ycf15, which were detected under a positive selection with dN/dS values > 1 (Figure 9). Further research on the functions of these genes is necessary since they may have a significant role in the adaptive evolution of the Ehretia species.
The phylogenetic relationships inferred from the results of the BI and ML analyses divided the order Boraginales into two well-supported clades ( Figure 10). The first clade included the family Boraginaceae and its two subfamilies, Boraginoideae and Cynoglossoideae, which is congruent with the findings of a previous study [85]. The second clade comprised three families, namely Ehretiaceae, Cordiaceae, and Heliotropiaceae. Moreover, Ehretiaceae was identified as a sister to Cordiaceae, which is again congruent with the findings of previous studies [22,51]. Our results support the recognition that the order Boraginales contains a number of distinct families, which is congruent with the findings of several molecular analyses in previous studies [1,19,51,86], but differs from the APG IV system view, which recognizes the order Boraginales to contain only a single family, that is, Boraginaceae, and several subfamilies [15].

Conclusions
In this study, we analyzed and compared the basic characteristics of the complete chloroplast genomes of two Ehretia species (E. cymosa and E. obtusifolia). Moreover, the base compositions, SSRs and long repeats, RNA editing sites, codon usages, and IR boundaries were identified and analyzed in these genomes. In the phylogenetic analysis, two clades in the order Boraginales were recognized, the first containing a single family (Boraginaceae) and the second including three families (Ehretiaceae, Cordiaceae, and Heliotropiaceae). The present results provide valuable insights into the evolutionary relationships within the order Boraginales. However, we suggest that the analysis of more cp genome sequences from other families in the order Boraginales (e.g., Wellstediaceae, Namaceae, Lennoaceae, Hydrophyllaceae, Hoplestigmataceae, Coldeniaceae, and Codonaceae) is necessary to expand our understanding of the evolutionary relationships within the order Boraginales.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/f14071486/s1, Table S1: Gene contents in Ehretia cymosa and Ehretia obtusifolia chloroplast genomes; Table S2: Exons and introns lengths in Ehretia cymosa and Ehretia obtusifolia chloroplast genomes; Table S3: Codon-anticodon recognition patterns and codon usage of the Ehretia cymosa chloroplast genome; Table S4: Codon-anticodon recognition patterns and codon usage of the Ehretia obtusifolia chloroplast genome; Table S5: Predicted RNA editing site in the Ehretia cymosa and Ehretia obtusifolia chloroplast genome; Table S6: Repeat sequences present in the Ehretia cymosa chloroplast genome; Table S7: Repeat sequences present in the Ehretia obtusifolia chloroplast genome; Table S8: Simple sequence repeats in the chloroplast genome of Ehretia cymosa; Table S9: Simple sequence repeats in the chloroplast genome of Ehretia obtusifolia.