Chloroplast Genomes Evolution and Phylogenetic Relationships of Caragana species

Caragana sensu lato (s.l.) includes approximately 100 species that are mainly distributed in arid and semi-arid regions. Caragana species are ecologically valuable for their roles in windbreaking and sand fixation. However, the taxonomy and phylogenetic relationships of the genus Caragana are still unclear. In this study, we sequenced and assembled the chloroplast genomes of representative species of Caragana and reconstructed robust phylogenetic relationships at the section level. The Caragana chloroplast genome has lost the inverted repeat region and wascategorized in the inverted repeat loss clade (IRLC). The chloroplast genomes of the eight species ranged from 128,458 bp to 135,401 bp and contained 110 unique genes. All the Caragana chloroplast genomes have a highly conserved structure and gene order. The number of long repeats and simple sequence repeats (SSRs) showed significant variation among the eight species, indicating heterogeneous evolution in Caragana. Selective pressure analysis of the genes revealed that most of the protein-coding genes evolved under purifying selection. The phylogenetic analyses indicated that each section forms a clade, except the section Spinosae, which was divided into two clades. This study elucidated the evolution of the chloroplast genome within the widely distributed genus Caragana. The detailed information obtained from this study can serve as a valuable resource for understanding the molecular dynamics and phylogenetic relationships within Caragana.


Introduction
Caragana Fabr.belongs to the Fabaceae and includes approximately 100 species [1].Its native distribution ranges from Eastern Europe to temperate Asia, including Central Asia, the Qinghai-Tibet Plateau, and East Asia [2].The genus prefers arid and cold habitats, being mainly distributed in arid and semi-arid regions [3], where it is often the dominant plant in the community [4].
The taxonomy and phylogenetic relationships of the genus Caragana sensu lato (s.l.) are still unclear.Zhao [5] divided Caragana sensu stricto (s.s.) into five sections according to morphological characteristics (namely, Caragana, Spinosae, Longispinae, Jubatae, and Frutescentes).Recently, Duan et al. [6], using the nuclear ITS and plastid matK, psbA-trnH, and trnL-F markers, supported that the two genera of Calophaca and Halimodendron belong to Caragana and should be treated as section Calophaca and section Halimodendron.Calophaca Fisch.includes about eight species with a native range from European Russia to Xinjiang and Pakistan [7].Halimodendron Fisch.ex DC. includes only one species, which is distributed in dry sand and saline soil in the Caucasus, Western Siberia, Central Asia, and the Tianshan Mountains.Both Caragana s.s. and Calophaca are shrubs.Caragana s.s. has paripinnate leaves; axillary flowers that are usually solitary but sometimes present in groups of two-five in a fascicle; and cylindrical or compressed legumes.Calophaca has imparipinnate leaves and four-flowered (or more) raceme (Figure 1).Halimodendron has paripinnate leaves; racemes with four or more flowers; inflated legumes; and thick valves [8].
imparipinnate leaves and four-flowered (or more) raceme (Figure 1).Halimodendron has paripinnate leaves; racemes with four or more flowers; inflated legumes; and thick valves [8].The phylogenetic relationships among the sections of Caragana sensu lato (s.l.) were poorly supported and could not be resolved using the rbcL, trnS-trnG, and ITS markers [4].Moreover, the topology of the section was also inconsistent between different studies.For example, section Caragana or Frutescentes was supported as the basal group of Caragana based on the results of Zhang et al. [9] and Duan et al. [6], respectively.Previous studies suffered from poor phylogenetic resolution due to the limited availability of DNA sequence data (rbcL, matK, trnS-trnG, trnL-F, psbA-trnH, and ITS) and the low genetic variability in the selected molecular markers [6,10,11], possibly attributed to the recent origin and rapid diversification of this genera.Overall, previous phylogenetic trees showed problems such as low bootstrap values and low robustness due to only a few molecular markers with the limitation of sequencing technology.Consequently, The phylogenetic relationships among the sections of Caragana sensu lato (s.l.) were poorly supported and could not be resolved using the rbcL, trnS-trnG, and ITS markers [4].Moreover, the topology of the section was also inconsistent between different studies.For example, section Caragana or Frutescentes was supported as the basal group of Caragana based on the results of Zhang et al. [9] and Duan et al. [6], respectively.Previous studies suffered from poor phylogenetic resolution due to the limited availability of DNA sequence data (rbcL, matK, trnS-trnG, trnL-F, psbA-trnH, and ITS) and the low genetic variability in the selected molecular markers [6,10,11], possibly attributed to the recent origin and rapid diversification of this genera.Overall, previous phylogenetic trees showed problems such as low bootstrap values and low robustness due to only a few molecular markers with the limitation of sequencing technology.Consequently, expanding the sampling of genetic information, such as by using complete chloroplast genomes, could provide a more robust phylogenetic tree and could improve investigations of the evolutionary history of Caragana at the section level.Furthermore, a well-supported phylogenetic framework will enable a better estimate of the divergence time of these genera.
Chloroplast genome sequences have been widely used to investigate phylogenetic relationships among closely related species due to their higher mutation rates, maternal inheritance, and lack of recombination [12][13][14].The advancement of sequencing technologies has facilitated cost-effective, easy-to-obtain, complete chloroplast genome sequences, enabling the transition from gene-based phylogenetics to genome-based phylogenomics [13,15] and facilitating the investigation of evolutionary phenomena in plant species with greater efficiency.Recently, the chloroplast genomes of several Caragana s.s.species have been sequenced and annotated, focusing on chloroplast genome evolution [16][17][18].
In order to understand the evolutionary history and the dynamic evolution of Caragana, we sequenced the chloroplast genomes of representative species of Caragana.In this study, we specifically aimed to (1) deepen our understanding of Caragana evolution using chloroplast genomes and (2) reconstruct robust phylogenetic relationships in Caragana at the section level.

Analyses of Repeat Sequences
In the eight Caragana chloroplast genomes, we detected 1421 simple sequence repeats (SSRs), with the number of SSRs in each genome ranging from 156 to 222 (Figure 3).Six types of SSR were identified.Among them, mononucleotide repeats were the most frequent in all eight species, with proportions ranging from 39.62% (C.erinacea) to 67.57% (C.acanthophylla) (Figure 3a).Dinucleotide repeats were the second most frequent SSR type, with proportions ranging from 13.51% (C.acanthophylla) to 37.74% (C.erinacea).Caragana acanthophylla had the most trinucleotide repeats, followed by C. jubata with 14 trinucleotide repeats.The number of tetranucleotide repeats ranged from 14 (C.acanthophylla) to 26 (C.korshinskii).Some Caragana species did not have trinucleotides or hexanucleotides.The SSRs had a significant bias toward A/T bases (Figure 3b), with most mononucleotide repeats being A/T (54.54%) and most dinucleotide repeats being AT/TA (23.36%).

Analyses of Repeat Sequences
In the eight Caragana chloroplast genomes, we detected 1421 simple sequence repeats (SSRs), with the number of SSRs in each genome ranging from 156 to 222 (Figure 3).Six types of SSR were identified.Among them, mononucleotide repeats were the most frequent in all eight species, with proportions ranging from 39.62% (C.erinacea) to 67.57% (C.acanthophylla) (Figure 3a).Dinucleotide repeats were the second most frequent SSR type, with proportions ranging from 13.51% (C.acanthophylla) to 37.74% (C.erinacea).Caragana acanthophylla had the most trinucleotide repeats, followed by C. jubata with 14 trinucleotide repeats.The number of tetranucleotide repeats ranged from 14 (C.acanthophylla) to 26 (C.korshinskii).Some Caragana species did not have trinucleotides or hexanucleotides.The SSRs had a significant bias toward A/T bases (Figure 3b), with most mononucleotide repeats being A/T (54.54%) and most dinucleotide repeats being AT/TA (23.36%).We detected 1296 long repeats, consisting of forward repeats (F), palindromic repeats (P), reverse repeats (R), and complementary (C) repeats with 30-472 bp in the eight chloroplast genomes (Figure 4a).The repeat number and length varied from one to another.Caragana roborovskyi had the greatest number of repeats (314) followed by C. acanthophylla and C. erinacea (205), and C. jubata had the lowest (84).The number of long repeats significantly differed among the four long repeat types.Forward repeats were most common (1025), followed by palindromic repeats (238).Complementary repeats were only identified in C. jubata and C. roborovskyi.The analysis of forward, palindromic, and reverse repeat lengths revealed that the most common type was long repeats (30-40 bp; Figure 4bd), followed by repeats with lengths of 41-50 bp and 51-60 bp.We detected 1296 long repeats, consisting of forward repeats (F), palindromic repeats (P), reverse repeats (R), and complementary (C) repeats with 30-472 bp in the eight chloroplast genomes (Figure 4a).The repeat number and length varied from one to another.Caragana roborovskyi had the greatest number of repeats (314) followed by C. acanthophylla and C. erinacea (205), and C. jubata had the lowest (84).The number of long repeats significantly differed among the four long repeat types.Forward repeats were most common (1025), followed by palindromic repeats (238).Complementary repeats were only identified in C. jubata and C. roborovskyi.The analysis of forward, palindromic, and reverse repeat lengths revealed that the most common type was long repeats (30-40 bp; Figure 4b-d), followed by repeats with lengths of 41-50 bp and 51-60 bp.

Genomic Sequence Divergence
The multiple sequence alignments determined with mVISTA revealed no evidence of genomic rearrangements or large inversions across the Caragana chloroplast genomes, as illustrated in Figure 5.These chloroplast genomes exhibited a high degree of conservation, both in terms of gene order and sequence identity.Notably, the coding regions and inverted repeat (IR) regions demonstrated a higher level of conservation compared to the non-coding regions and single-copy regions.The analysis further revealed that the intergenic spacer regions exhibiting the most variation were trnI-rpl23, rpl2-rps19, clpP-rps12, trnC-petN, trnT-trnL, and trnN-trnR.Additionally, the coding genes accD, rpoC2, and ycf1 demonstrated high levels of variation among the coding regions of the chloroplast genomes.

Genomic Sequence Divergence
The multiple sequence alignments determined with mVISTA revealed no evidence of genomic rearrangements or large inversions across the Caragana chloroplast genomes, as illustrated in Figure 5.These chloroplast genomes exhibited a high degree of conservation, both in terms of gene order and sequence identity.Notably, the coding regions and inverted repeat (IR) regions demonstrated a higher level of conservation compared to the non-coding regions and single-copy regions.The analysis further revealed that the intergenic spacer regions exhibiting the most variation were trnI-rpl23, rpl2-rps19, clpP-rps12, trnC-petN, trnT-trnL, and trnN-trnR.Additionally, the coding genes accD, rpoC2, and ycf1 demonstrated high levels of variation among the coding regions of the chloroplast genomes.

Codon Usage Pattern and Molecular Evolution
We analyzed the codon usage of the protein-coding genes in the chloroplast genomes is displayed on the y-axis and ranges from 50% to 100%.The x-axis indicates the co-ordinates of the chloroplast genomes.Genomic regions are color-coded as protein-coding (exon), intron, intergenic region space (IGS), and rRNA.Genes are indicated by gray arrows.

Codon Usage Pattern and Molecular Evolution
We analyzed the codon usage of the protein-coding genes in the chloroplast genomes of the eight Caragana species (Table S3).The number of codons ranged from 23,335 (Calophaca soongorica) to 23,750 (C.halodendron).We identified 64 codons encoding 20 amino acids, including three termination codons.Leucine (Leu) showed the highest abundance among the amino acids, accounting for 24,164 occurrences (12.89%) across all eight species.This was followed by isoleucine (Ile), with a frequency of 15,339 (8.18%).In contrast, methionine (Met) was represented by the fewest number of codons, totaling 5017 occurrences.
The results of relative synonymous codon usage (RSCU; Figure 6) indicate that 24 codons were used more frequently with RSCU > 1. Notably, 23 of these 24

Molecular Evolution of the Caragana Chloroplast Genome
We assessed the evolutionary rate disparities among Caragana species according to computed dN, dS, and ω values with 74 protein-coding genes, gene groups, and the combinations of all 74 protein-coding genes for each Caragana species (Figure 7 and Table S4).The t-test for dN, dS, and ω values revealed noteworthy variations for each gene, denoting diverse molecular evolution rates across genes.

Molecular Evolution of the Caragana Chloroplast Genome
We assessed the evolutionary rate disparities among Caragana species according to computed dN, dS, and ω values with 74 protein-coding genes, gene groups, and the combinations of all 74 protein-coding genes for each Caragana species (Figure 7 and Table S4).
The t-test for dN, dS, and ω values revealed noteworthy variations for each gene, denoting diverse molecular evolution rates across genes.
Among the different gene groups (Figure 7d), the rps group exhibited the highest dN value (0.106), followed by atp (0.063) and rpl (0.055), while the photosynthetic genes (psa and psb), ndh and pet demonstrated the lowest dN values.There was less variation in dS among the different gene groups (Figure 7e).The ω value was <0.5 for all gene groups except the rps group, indicating purifying selection (Figure 7f).The ω value was >1 in the rps group in Calophaca soongorica, indicating positive selection within this group.
Among the different gene groups (Figure 7d), the rps group exhibited the highest dN

Phylogenetic Analysis
The genera Caragana included seven sections: Caragana, Calophaca, Halimodendron, Spinosae, Longispinae, Jubatae, and Frutescentes.A total of 26 Caragana (including genus Halimodendron and Calophaca) chloroplast genomes were used for phylogenetic investigation to elucidate the phylogenetic relationships of Caragana at the section level.The species Astragalus tumbatsica, Astragalus mongholicus, Hedysarum taipeicum, and Hedysarum petrovii were used as outgroups in the analysis.
Both maximum likelihood (ML) and Bayesian inference (BI) approaches yielded consistent tree structures (Figure 8), with robust support for the monophyly of the genus Caragana, which was evident in both analyses (bootstrap value (BS) = 100%; posterior probability (PP) = 1.0).The Caragana phylogenetic trees were strongly supported, indicating that the relationships among the Caragana species were well resolved based on the whole chloroplast genome sequences.These species of Caragana formed three clades with high support.Clade I, consisting of the section Jubatae, was the first group of this genera to diverge.The section Frutescentes formed Clade II and was sister to the rest of Caragana.The section Frutescentes further diverged into two groups.Clade III included two main groups; section Caragana was sister to section Spinosae and formed a group.The four sections of Calophaca, Halimodendron, Longispinae, and Spinosae formed a clade with high support (BB = 100%; PP = 1.0), and section Calophaca was sister to the other three sections.The section Spinosae did not form a clade and was divided into two groups.
to elucidate the phylogenetic relationships of Caragana at the section level.The species Astragalus tumbatsica, Astragalus mongholicus, Hedysarum taipeicum, and Hedysarum petrovii were used as outgroups in the analysis.
Both maximum likelihood (ML) and Bayesian inference (BI) approaches yielded consistent tree structures (Figure 8), with robust support for the monophyly of the genus Caragana, which was evident in both analyses (bootstrap value (BS) = 100%; posterior probability (PP) = 1.0).The Caragana phylogenetic trees were strongly supported, indicating that the relationships among the Caragana species were well resolved based on the whole chloroplast genome sequences.These species of Caragana formed three clades with high support.Clade I, consisting of the section Jubatae, was the first group of this genera to diverge.The section Frutescentes formed Clade II and was sister to the rest of Caragana.The section Frutescentes further diverged into two groups.Clade III included two main groups; section Caragana was sister to section Spinosae and formed a group.The four sections of Calophaca, Halimodendron, Longispinae, and Spinosae formed a clade with high support (BB = 100%; PP = 1.0), and section Calophaca was sister to the other three sections.The section Spinosae did not form a clade and was divided into two groups.

Structure and Comparative Analysis of Caragana Chloroplast Genomes
In this study, we sequenced eight species (Figure 1) of Caragana, each from a different section, and we compared the chloroplast genome variations and mutations.The chloroplast genome length ranged from 128,458 bp to 135,401 bp, which is significantly shorter than other angiosperms [12,13,19].The shorter genome lengths were due to the loss of the inverted repeat (IR) region (Figure 2).Caragana, as a member of the tribe Caraganeae, is categorized within the IR-lacking clade (IRLC) in the subfamily Papilionoideae of Fabaceae.The loss of the chloroplast genome IR region in IRLC taxa, which is recognized as a robust phylogenetic indicator within the clade, has been consistently validated in prior research [20][21][22].
The loss of IR regions likely leads to mechanisms such as increased mutation rates, facilitated genome rearrangements, and altered selection pressures and intron structure, significantly accelerating the rate of evolution of the chloroplast genome [23].Numerous studies have shown gene and intron loss, such as the loss of introns from the rps12 and clpP genes in chickpea (Cicer arietinum) [24], loss of the accD gene in Trifolium [25,26], and loss of rpl22, rps16, and one intron of clpP in Vicia [27].In this study, the rpl22 and infA genes in the Caragana chloroplast genomes were lost (Figure 2).The infA gene is a notably labile chloroplast gene, and there are multiple independent cases of its loss or transfer to the nucleus in flowering plants [28].The gene rpl22, encoding ribosomal protein CL22, also has been lost from the chloroplast genomes and translocated to the nucleus [29].
The significant difference in chloroplast genome size between Caragana and other angiosperms indicates that IRLC chloroplast genomes may have undergone different evolutionary processes.Compared to other angiosperm species, the GC content in the Caragana chloroplast genome is lower due to the IR region generally displaying higher GC content compared to other regions of the chloroplast genome.This disparity is primarily attributed to the presence of four rRNA genes with high GC content (50-56.4%) in the IRs.Similar to GC content, codon usage biases also showed differences from other species, including both IR regions, indicating significant molecular evolutionary processes.Notably, Caragana chloroplast genomes exhibited similar GC content and codon usage bias to other IRLC species [30].
Repeat sequences play a crucial role in genome rearrangements and serve as significant markers in evolutionary studies [31].We identified four types of long repeats and SSRs.Forward repeats were the most frequent in the Caragana chloroplast genomes, and complementary repeats were the lowest (Figure 4).There are minor variations in the numbers and types of repeats among closely related species (Figure 5).Nonetheless, the number of repeats displayed noteworthy variability across the eight Caragana species, indicating heterogeneous evolution in the Caragana chloroplast genomes.SSRs exhibit high levels of polymorphism at the species level.They have been extensively studied in the field of population genetics, and variation has been documented [32,33].In the current study, mononucleotide repeats were highly abundant, predominantly comprising A/T repeats rather than G/C repeats (Figure 3).This strong bias toward A/T in SSR loci was also noted in other legumes, such as Albizia julibrissin [34] and Desmodium stryacifolium [35].

Phylogenetic Relationships among the Sections of Caragana
We reconstructed a robust phylogenetic tree with high bootstrap values using chloroplast genomes.The findings do not support the results of Zhao [5], who classified Caragana into three subgenera.The results of the current study support the monophyly of Caragana and classify Calophaca and Halimodendron as sections [6].Previous studies have shown that the monophyly of sections Longispinae, Spinosae, Frutescentes, and Jubatae is difficult to recover based on the small number of molecular markers [11], but we recovered the monophyly of sections Frutescentes and Jubatae.Our results strongly suggest that Caragana be split into two clades with seven sections.Section Jubata was found to be a sister to the most recent common ancestor of the other six sections and was thus identified as the basal clade.The next to be differentiated was section Frutescentes.The sister relationship of section Longispinae and Halimodendron was not strong (bootstrap value = 87).The monophyly of section Spinosae was not recovered, and future studies with nuclear genetic data representing biparental inheritance and more complete species sampling will be required [36], as well as further study of the cause of conflict by comparing the phylogenetic relationships of nuclear genes and plastids.
Our phylogenetic results are consistent with the existing morphological taxonomic consensus [5].Section Calophaca has some unique traits, such as imparipinnate leaves and racemes with glandular trichomes, and we speculated that these traits are derived characteristics.Section Halimodendron has racemes, pale purple or purplish red corollas, inflated legumes, and thick valves.It has a strong tolerance to drought and salinity, and its habitats vary greatly from those of other species in the genus Caragana.We propose that section Calophaca likely experienced a unique adaptive evolutionary history.The leaf morphology of Caragana is diverse and is mainly divided into paripinnate (4-20-foliolate) and digitate (4-foliolate).All members of section Frutescentes have digitated leaves.Section Frutescentes also has the highest species richness, covering 36% of Caragana, which is likely related to trait innovation.Section Caragana has caducous rachis, pinnate leaves, and 4-10 pairs of leaflets.Section Jubata has persistent rachis, pinnate leaves, and three-seven pairs of leaflets and is mostly distributed in high-elevation regions.The leaflets of section Spinosae are pinnate on the rachis of long branches, digitate or pinnate connected on the rachis of short branches, and are often found in two-three pairs.Section Longispinae has three-nine leaflet pairs and generally either long branches with persistent rachis or short branches with caducous rachis.

Plant Material, DNA Extraction, and Sequencing
According to the taxonomic system of Caragana, six Caragana s.s.species, one Halimodendron species (Halimodendron halodendron), and one Calophaca species (Calophaca soongorica) were selected for this study (Figure 1), representing all the eight sections of Caragana.Fresh leaves were collected from various regions in China, including Xinjiang, Xizang, Gansu, Sichuan, Hebei, and Beijing.The fresh leaves were dried with silica gel.Details on the material are provided in Table S1.The plant materials were identified by Prof. Zhixiang Zhang (Beijing Forestry University).Voucher specimens were deposited at the museum of Beijing Forestry University (BJFC) under voucher specimen numbers BJFC00024602, BJFC00120870, BJFC00120894, ZJK05210301, FS05270301, and SC060501.
Genomic DNA was extracted following the methods of Li et al. [37].The quality and quantity of extracted DNA were evaluated with a Qubit 3.0 Fluorometer (Thermo Fisher Scientific, Waltham, MA, USA).Then, 300 ng of DNA for each sample was randomly sheared into ~350-bp fragments using an Ultrasound Covaris instrument.The paired-end sequencing library was generated and sequenced on the DNBSEQ-T7 platform to generate 150 bp paired-end reads.For each sample, approximately 5 GB of raw data was obtained.

Chloroplast Genome Assembly and Annotation
Trimmomatic 0.36 [38] was used to check the sequencing quality of the raw data, such as adapter-contaminated reads and low-quality bases.The chloroplast genomes were de novo assembled with the GetOrganelle v1.6.2 [39] with parameter k-mer lengths of 95.If GetOrganelle was unable to assemble a complete chloroplast genome, the method of Dong et al. [13,40] was used to complete the assembly.Plastid Genome Annotator (PGA) Version 3 [41] was used to annotate the newly sequenced chloroplast genomes.Geneious Prime was used to check and revise the annotation errors manually.The online software Chloroplast (https://irscope.shinyapps.io/Chloroplot/,accessed on 21 May 2024) was used to draw the physical maps.

Comparative Genomic Analyses and Molecular Evolution Analysis
The Caragana chloroplast genome structure variation was examined using mVISTA (https://genome.lbl.gov/vista/mvista/submit.shtml, accessed on 23 May 2024) [43] with the shuffle-LAGAN model.The chloroplast genome of Caragana erinacea was used as the reference sequence.
We calculated relative synonymous codon usage (RSCU), which is the ratio of the observed frequency of a specific codon to the expected frequency, with CodonW v1.4.2.The R package Pvclust was used to assess the uncertainty in hierarchical cluster analysis [44].The RSCU values were then plotted as a heatmap using TBtools 1.116 [45].
We computed the non-synonymous substitution ratio (dN), synonymous substitution ratio (dS), and dN/dS ratio (ω) to investigate the molecular evolution within the Caragana chloroplast genomes.This evaluation facilitates the examination of evolutionary rates and the impact of natural selection on molecular evolution.Hedysarum taipeicum served as the reference species in these analyses.
We extracted all protein-coding genes with PhyloSuite v1.2.3 according to genome annotation [46] and aligned with MAFFT v7.471.The dN, dS, and ω values were determined utilizing the YN100 method in PAML v4.10.3 [47].These calculations were conducted for gene clusters sharing identical functions [48], and at the species level, they were combined with all coding genes for each species.

Phylogenetic Analysis
The phylogenetic relationships of Caragana were reconstructed using the chloroplast genome sequences.In addition to the chloroplast genomes sequenced in this study, complete Caragana chloroplast genomes (publicly available in NCBI) were used (Table S2).Four species from two genera were designed as outgroups according to a previous phylogenetic study [49].We aligned the whole chloroplast genome sequences with MAFFT [50] with default parameters and detected the unreliable alignment regions with trimAl v1.4.1 [51].
We performed the phylogenetic analyses using two methods: maximum likelihood (ML) and Bayesian inference (BI).ModelFinder v1.5.4 was used to determine the best substitution model [52].The ML tree was reconstructed using IQ-TREE v2.0.3 [53] with 1000 bootstrap replicates.The BI tree was inferred using MrBayes 3.2 [54].Four MCMC simulations were run simultaneously and sampled every 1000 generations for a total of two million generations.The first 25% of each tree was discarded as burn-in.We used Tracer v1.7 [55] to analyze the effective size of each parameter and the convergence to a stationary distribution, and we estimated posterior probabilities using a 50% majority-rule consensus tree.

Figure 2 .
Figure 2. Gene map of the chloroplast genomes of Caragana species.Genes outside the ring are transcribed in a counterclockwise direction, whereas those inside the ring are transcribed in a clockwise direction.The GC content of the chloroplast genome is shown in the dark gray region, and the AT content is shown in the light gray region.Calophaca soongorica has the abbreviation of Ca. soongorica.

Figure 2 .
Figure 2. Gene map of the chloroplast genomes of Caragana species.Genes outside the ring are transcribed in a counterclockwise direction, whereas those inside the ring are transcribed in a clockwise direction.The GC content of the chloroplast genome is shown in the dark gray region, and the AT content is shown in the light gray region.Calophaca soongorica has the abbreviation of Ca. soongorica.

Figure 3 .
Figure 3. Number and type of simple sequence repeats (SSRs) of eight Caragana species in the chloroplast genome.(a) Number of six types of SSRs.(b) Number of base types of SSRs.

Figure 3 .
Figure 3. Number and type of simple sequence repeats (SSRs) of eight Caragana species in the chloroplast genome.(a) Number of six types of SSRs.(b) Number of base types of SSRs.

Figure 4 .
Figure 4. Number and type of dispersed repeats of eight Caragana species in the chloroplast genome.(a) Number and type of long repeats in dispersed repeats.(b) Length and frequency of forward long repeats.(c) Length and frequency of palindromic long repeats.(d) Length and frequency of reverse long repeats.

Figure 4 . 16 Figure 5 .
Figure 4. Number and type of dispersed repeats of eight Caragana species in the chloroplast genome.(a) Number and type of long repeats in dispersed repeats.(b) Length and frequency of forward long repeats.(c) Length and frequency of palindromic long repeats.(d) Length and frequency of reverse long repeats.Int.J. Mol.Sci.2024, 25, x FOR PEER REVIEW 7 of 16

Figure 5 .
Figure 5. Sequence similarity plots of eight chloroplast genomes of Caragana, which were constructed using mVISTA (C.erinacea was used as the reference sequence).The proportion of sequence identity codons possessed A/U at the third nucleotide positions except UUG.Conversely, most of the codons ending with G/C exhibited RSCU values lower than 1, suggesting less prevalent usage in the eight chloroplast genomes.The bias towards A/U at the third position of the codons was further evidenced by the A/U contents of the codons, with a mean value of 58.79% at the third codon position.Two codons, ATU for methionine (Met) and UGG for tryptophan (Trp), displayed RSCU values of 1.00, indicating no codon bias.Int.J. Mol.Sci.2024, 25, x FOR PEER REVIEW 8 of 16 ending with G/C exhibited RSCU values lower than 1, suggesting less prevalent usage in the eight chloroplast genomes.The bias towards A/U at the third position of the codons was further evidenced by the A/U contents of the codons, with a mean value of 58.79% at the third codon position.Two codons, ATU for methionine (Met) and UGG for tryptophan (Trp), displayed RSCU values of 1.00, indicating no codon bias.

Figure 7 .
Figure 7.The non-synonymous substitution ratio (dN), synonymous substitution ratio (dS), and dN/dS ratio (ω), computed for eight species and gene groups using PAML.(a-c) dN, dS, and ω values of eight species.(d-f) dN, dS, and ω values of gene groups with eight species.

Figure 7 .
Figure 7.The non-synonymous substitution ratio (dN), synonymous substitution ratio (dS), and dN/dS ratio (ω), computed for eight species and gene groups using PAML.(a-c) dN, dS, and ω values of eight species.(d-f) dN, dS, and ω values of gene groups with eight species.

Figure 8 .
Figure 8. Phylogenetic tree, constructed using the maximum likelihood (ML) and Bayesian inference (BI) methods with chloroplast genomes.The number on the line represents the ML bootstrap support

Table 1 .
Features of the chloroplast genomes of eight Caragana species.