Next Article in Journal
Research on Sweet Potato (Ipomoea batatas) in West Africa: State, Features and Gaps
Next Article in Special Issue
Evaluation of Commercial Tomato Hybrids for Climate Resilience and Low-Input Farming: Yield and Nutritional Assessment Across Cultivation Systems
Previous Article in Journal
Biological Solutions for Higher Maize Yield and Reduced Stalk Damage Caused by the European Corn Borer, Ostrinia nubilalis (Hübner)
Previous Article in Special Issue
Screening and Assessment of Genetic Diversity of Rice (Oryza sativa L.) Germplasm in Response to Soil Salinity Stress at Germination Stage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comprehensive Analysis of Chloroplast Genomes in Leguminous Forage Species: Codon Usage, Phylogenetic Relationships, and Evolutionary Insights

by
Rui Yang
,
Ying Xue
,
Xiaofan He
and
Tiejun Zhang
*
School of Grassland Science, Beijing Forestry University, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Agronomy 2025, 15(4), 765; https://doi.org/10.3390/agronomy15040765
Submission received: 24 February 2025 / Revised: 11 March 2025 / Accepted: 18 March 2025 / Published: 21 March 2025
(This article belongs to the Special Issue Genetics and Breeding of Field Crops in the 21st Century)

Abstract

:
Leguminous forages play critical roles in sustainable agriculture and ecosystem management by enhancing soil fertility through nitrogen fixation and providing high-quality protein for livestock. This study sequenced and assembled the chloroplast genome of Thermopsis alpina using high-throughput sequencing technology. Along with 29 other leguminous forage species obtained from the NCBI database, we conducted comprehensive analyses of the chloroplast genome of 30 species, focusing on their codon usage patterns, phylogenetic relationships, and evolutionary dynamics. The results revealed that the chloroplast genome of Thermopsis alpina exhibits a typical quadripartite structure, with a total length of 153,714 bp, encoding 124 genes and comprising a large single-copy region (LSC, 83,818 bp), a small single-copy region (SSC, 17,558 bp), and two inverted repeat regions (IRs, 26,169 bp). Relative synonymous codon usage (RSCU) analysis revealed 28 preferred codons, predominantly terminating in A/U, with a notable preference for the leucine codon UUA across all species. Additionally, the effective number of codons (ENC) and the PR2 plot analysis suggest a weak codon usage bias, primarily shaped by selective pressures rather than mutational forces. Simple sequence repeat (SSR) analysis shows a notable concentration of SSRs in intergenic regions, highlighting their potential role in genome stability and evolution. Phylogenetic tree construction based on chloroplast genome data further uncovers the genetic relationships and evolutionary trajectories within the leguminous forage species. Overall, these findings provide valuable insights into the molecular evolution of leguminous forages and offer a theoretical basis for their improved utilization in sustainable agricultural practices and ecological restoration.

1. Introduction

Leguminous forages are among the most diversely utilized and ecologically significant plant species in agriculture and ecosystem management [1]. Thermopsis alpina is a perennial herbaceous plant of the genus Thermopsis in the family Fabaceae, mainly growing in regions such as Xinjiang and Inner Mongolia in China [2]. As a perennial herbaceous species within the Fabaceae family, Thermopsis alpina occupies a critical ecological niche in alpine grasslands through its nitrogen-fixing symbiosis with rhizobia bacteria [3]. The significance of these plants in agriculture is further emphasized by their application in pasture systems, soil erosion control, and their function in ecological restoration projects. Forage legumes, such as alfalfa (Medicago sativa) and clover (Trifolium pratense), are of particular importance because of their high protein content, which makes them invaluable as livestock feed resources [4,5]. They play essential roles in livestock production, and their cultivation is widespread across diverse agroecosystems, from temperate to tropical regions. This natural nitrogen fixation process reduces the need for synthetic fertilizers, making legumes indispensable in organic farming and other sustainable agricultural practices [6]. Furthermore, leguminous forage plants improve the soil structure and prevent erosion, providing essential ecosystem functions, particularly in regions susceptible to desertification and other environmental challenges. Consequently, understanding the molecular and genetic foundations of these plants is crucial for optimizing their use in agricultural systems and enhancing their ecological benefits.
The chloroplast genome, maternally inherited in most plants, provides unique insights into the evolutionary relationships and functional adaptations. Chloroplasts, specialized organelles in plant cells, are crucial for photosynthesis and play a central role in plant metabolism, environmental adaptation, and molecular evolution [7]. The chloroplast genome (cpDNA) is a vital organellar genome encoding essential proteins for photosynthesis and other metabolic processes [8]. Although smaller than the nuclear genome, the chloroplast genome is critical for cellular functions such as energy production [9], carbon fixation [9], and the synthesis of fatty acids [10]. While the structure of chloroplast genomes is relatively conserved across plant species, variations exist, particularly among different plant families, genera, and species. Due to their highly conserved nature, certain chloroplast regions are frequently utilized as DNA barcodes. The noncoding regions of the chloroplast genome are rich in simple sequence repeats (SSR) and internal transcribed spacer (ITS) sequences, and sequence variation in these regions can be used as molecular markers, which are widely used for resolving kinship relationships between varieties [11]. In higher plants, chloroplasts exhibit uniparental inheritance, typically matrilineal in angiosperms [12], with substantial variation in chloroplast genomes and gene numbers across species. Therefore, the chloroplast genome has become an effective molecular tool for germplasm identification analysis and germplasm creation and has been widely used in species identification [13], phylogenetic analysis [14], and species evolution [15,16].
In recent years, with the rapid development of high-throughput sequencing technology, an increasing number of legume chloroplast genomes have been analyzed, and new progress has been made in the study of legume forage chloroplast genomes. The characterization of chloroplast genomes in five Medicago species offers a theoretical basis for investigating the kinship relationships within this section [17]. A previous study has reconstructed phylogenetic relationship analyses of alfalfa on chloroplast genome and ITS sequences, indicating that the genetic differentiation patterns of Medicago species in China basically coincided with those of the traditionally classified Medicago, Spirocarpos, Platycarpae, and Lupularia [18]. These findings indicate that the chloroplast genome serves as a valuable molecular tool for studying the evolution of forage legumes, offering insights into their origin and evolutionary position.
In addition to analyzing the chloroplast genome structure, a critical focus of genomic analysis is exploring codon usage bias [19]. Codons serve as the link between mRNA and protein in living organisms, playing an essential role in transmitting genetic information. There are 64 codons in advanced plants, 3 of which are stop codons, and the remaining 61 are translated into 20 amino acids [20]. With the exception of methionine and tryptophan, which have only 1 corresponding codon, the other amino acids usually have 2–6 corresponding codons [21]. Codons encoding the same amino acid are termed synonymous codons [22]. The phenomenon of synonymous codon usage bias, where different species or even genes within the same species exhibit varying preferences for codon usage, is commonly observed in organisms [22]. Genes in different species or within the same species exhibit different codon usage preference patterns, and the phenomenon of uneven usage of synonymous codons prevalent in living organisms is known as codon bias [19,23]. This bias can influence translation efficiency, protein folding, and overall cellular function. Codon usage patterns can vary across different organisms, and even within the same organism, they can differ between different tissues or under different environmental conditions [24]. Increasing evidence suggests distinct codon usage preferences in organisms, closely related to gene function, expression levels, and the organism’s environment [25]. In legumes, the codon preferences of highly expressed genes were analyzed in the chloroplast genome of Medicago ruthenica, and A/U-rich codons were used more frequently in these genes [26]. Research has revealed seven high-frequency codons ending in A/T in the chloroplast genomes of nine forage legume plants, indicating that natural selection is the main influencing factor [27]. The results of a study of codon usage patterns in the chloroplast genome of Bothriochloa ischaemum revealed that the 15 optimal codons all end in A or U and that mutational pressure is more likely to affect their preferences [28]. However, specific studies on T. alpina’s forage potential remain limited compared to cultivated legumes such as alfalfa (Medicago sativa), highlighting a research gap in its utilization as a forage resource in high-altitude ecosystems. Most reports on the legume chloroplast genome have focused on codon preferences, with only a few studies, highlighting the necessity of further investigation into codon usage patterns in plants to construct stable transgenic systems.
This study sequenced and assembled the chloroplast genome of Thermopsis alpina, enriching the genetic information resources for this taxon, and analyzed the chloroplast genomes of 30 leguminous forage species to explore their codon usage, phylogenetic relationships, and evolutionary trajectories. This study aimed to clarify the chloroplast genome characteristics of Thermopsis alpina, while also revealing the codon usage bias in chloroplast genomes of different forage legume plants during the evolutionary process, thereby laying a theoretical foundation for the subsequent classification and resource development and utilization of forage legume species.

2. Materials and Methods

2.1. Taxon Sampling, DNA Extraction and Library Sequencing

Thermopsis alpina was collected from healthy plants located in the experimental field of Beijing Forestry University, and the samples collected were fresh leaves. The collected leaves were rapidly frozen in liquid nitrogen and stored at −80 °C until DNA extraction. DNA extraction was performed using a commercial DNA extraction kit (Tiangen Biochemical Technology Co., Beijing, China.) following the manufacturer’s protocol. The quality of extracted DNA was detected by 1% agarose gel electrophoresis, and concentrations were determined using a NanoDrop 2000 (Thermo Fisher Scientific Inc., Waltham, MA, USA) and a Qubit 3.0 fluorescence quantifier (Thermo Fisher Scientific Inc., Waltham, MA, USA). Extracted high-quality genomic DNA was used for library construction and sequencing. The constructed sequencing libraries were sequenced by DNBSEQ-T7 platforms, generating approximately 20 Gb of raw sequencing data. The whole chloroplast genome sequence of Thermopsis alpina was uploaded on 11 March 2025 to the GenBank database (https://www.ncbi.nlm.nih.gov/nuccore/, GenBank accession number PV262343). Chloroplast DNA (cpDNA) sequences of 29 other species utilized in this study were obtained from the NCBI database (https://www.ncbi.nlm.nih.gov/nuccore/, accessed on 15 September 2024). Among the following 30 species, 12 of them are toxic. The details of the chloroplast genome are shown in Table 1.

2.2. Chloroplast Genome Assembly and Annotation

Chloroplast genome assembly was performed using GetOrganelle (v1.7.4.1) software [29] and NOVOPlasty V4.3.3 [30]. The genome annotation was performed using the CPGAVAS2 (http://47.96.249.172:16019/analyzer/home, accessed on 25 September 2024) online tool [31] and the online website GeSeq—Annotation of Organellar Genomes (https://chlorobox.mpimp-golm.mpg.de/geseq.html, accessed on 30 September 2024).

2.3. Codon Bias and SSR Locus Analysis

Relative synonymous codon usage (RSCU) is a statistical measure that represents the ratio of the observed frequency of a specific codon to its expected frequency under theoretical conditions [19]. The RSCU values were calculated using Python (v3.12.2) with default settings. Microsatellite markers, also known as short tandem repeats (STRs) or simple sequence repeats (SSRs), are simple repetitive sequences uniformly distributed throughout the genomes of eukaryotic organisms [32]. These markers consist of tandem repeats of nucleotide segments ranging from 2 to 6 base pairs. Simple sequence repeats (SSRs) were identified using MISA-web [33], with the search parameters set to 10, 5, 4, 3, 3, 3, 3, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, hexa-, hepta-, octa-, nona-, and deca-nucleotides, respectively.

2.4. Effective Number of Codons (ENC) Plot Analysis

The effective number of codons (ENc) serves as a metric to assess codon usage bias across the genome, with its value spanning from 20 to 61 [34]. ENC-plot graphs were constructed with GC3 values as the x-axis and ENC values as the y-axis to examine the factors influencing codon usage patterns in genes and the relationship between base composition and codon usage bias. To evaluate the difference between the observed (ENCobs) and expected (ENCexp) values of the effective codon count, the frequency of ENC ratios was calculated, and the differences were quantified [34]. The standard curve in the figure is derived using the following equation:
E N C = 2 + 9 F 2 + 1 F 3 + 5 F 4 + 3 F 6

2.5. Parity Rule 2 Bias Plot (PR2 Plot) Analysis

The PR2-plot was utilized to examine the usage patterns and relationships among the four nucleotide bases at the third position of codons that encode amino acids. The G3/(G3 + C3) and A3/(A3 + T3) values of the examined gene codons were calculated, and the PR2 plot was constructed with G3/(G3 + C3) as the x-axis and A3/(A3 + T3) as the y-axis. The central point (0.5,0.5) of the graph represents A = T and C = G, indicating no codon usage preference, while other points signify varying degrees and directions of shift from this center [35]. Generally, under single mutational pressure, the ratios of A/T and C/G in a gene or genomic codon maintain equilibrium [36].

2.6. Analysis of the Neutrality Plot

Neutrality analysis was employed to investigate the influence of mutational pressure and natural selection on codon bias [37]. GC12 represents the average of GC1 and GC2. Scatter plots for neutral analysis were constructed by positioning each participant gene on a coordinate system with GC12 as the vertical axis and GC3 as the horizontal axis. A significant correlation between GC12 and GC3, indicated by a regression coefficient approaching or equal to 1, suggests that mutational pressure is the primary determinant of the codon usage pattern. Conversely, a non-significant correlation, with a regression coefficient near zero, implies that natural selection is the predominant influencing factor [38].

2.7. Correspondence Analysis (COA)

COA is a statistical method used to analyze the relationships between variables and samples [39]. In this study, to capture the diversity of codon usage, orthogonal axes were generated based on 59 codons, excluding methionine, tryptophan, and the 3 termination codons. The R programming language was utilized to create a relationship chart of axis 1 and axis 2. This chart was used to further investigate the factors influencing codon usage preference [40].

2.8. Collinearity Analysis and Phylogenetic Tree Construction of 30 Leguminous Forages

The BLAST program (v2.14.0+) was utilized to compare the chloroplast genome sequences against a comprehensive reference database, identifying regions of similarity and divergence [41]. The chloroplast genome sequences of 30 leguminous forages were aligned using the Multiple Sequence Alignment Program (MAFFT v7.450) [42]. In this study, the construction of the phylogenetic tree was based on average nucleotide identity (ANI) by using the PYANI (v0.2.12) (https://pyani.readthedocs.io/en/latest/, accessed on 8 October 2024). Genomic collinearity analysis was performed using genoPlotR (v0.8.11), which identifies and visualizes conserved gene order among chloroplast genomes, revealing structural homology between different genomes [43].

3. Results

3.1. Chloroplast Genome Structure, Classification, Function and Characterization

According to the chloroplast genome mapping (Figure 1), the Thermopsis alpina chloroplast genome is a typical tetrameric structure, which consists of a large single-copy region (LSC, 83,818 bp), a small single-copy region (SSC, 17,558 bp), and two inverted repetitive regions (IRs, 26,169 bp), respectively, with a total length of 153,714 bp and a GC content of 36.46%. The chloroplast genome contained 124 genes, including 80 known protein-coding genes (PCGs), 36 tRNA genes, and 8 rRNA genes. Genes annotated in the Thermopsis alpina chloroplast genomes are listed in Table 2. Within the annotated gene set, eleven intron-containing genes (atpF, ndhA, ndhB, rpl2, rpoC1, trnA-UGC, trnE-UUC, trnK-UUU, trnL-UAA, trnT-CGU, and trnV-UAC) were identified, each harboring a single intron, while two additional genes (clpP, and ycf3) exhibited two intronic regions.

3.2. Nucleotide Composition Analysis

The chloroplast genomes of the 30 leguminous forage plants exhibited a range in total length from 120,289 bp to 156,763 bp, with a mean length of 137,272 bp (Table 3). The total GC content varied between 33.61% and 36.61%, averaging 34.93%. A detailed analysis of the GC content at different codon positions revealed mean values of 44.99%, 37.42%, and 27.80% for GC1, GC2, and GC3s, respectively, with GC3 demonstrating the lowest average at 24.79%. Additionally, the mean ENC values for the chloroplast genes of these 30 legume forage plants ranged from 44.68 to 48.70, all exceeding 35. Detailed chloroplast genome characteristics for each species are shown in Table S1.

3.3. RSCU Analysis

The RSCU method was employed to characterize codon bias. The results showed that in the genomes of Glycine max, Glycine soja, and Clitoria ternatea, there were 30 codons with RSCU > 1. In the genomes of the remaining 27 species, there were 29 codons with RSCU > 1, and 28 of these codons had A/U termini, accounting for 96.55% of the total number of preference codons, which was significantly greater than the number of preference codons with G/C termini (Figure 2). This observation indicated a clear preference for A/U-terminated codons across the 30 species’ genomes. The codons that were commonly preferred across the 30 species included AAA, AAU, ACA, ACU, AGA, AGU, AUU, CAA, CAU, CCA, CCU, CGA, CGU, CUU, GAA, GAU, GCA, GCU, GGA, GGU, GUA, GUU, UAU, UCA, UCU, UGU, UUA, UUG, and UUU. The codon AUA was specifically preferred in Glycine max, Glycine soja, and Clitoria ternatea (Figure 2). Among these, the codon UUA, encoding leucine, demonstrated a strong preference (RSCU > 1.8) in all 30 species’ genomes (Figure 2 and Table S2).

3.4. SSR Analysis

The quantity of SSRs in the complete chloroplast genome sequence varied from 88 (Vicia sativa) to 192 (Trifolium subterraneum) and a notable concentration of SSRs in intergenic regions (Figure 3a and Table S3). The most common types of SSRs were mononucleotide repeats, followed by the dinucleotide, the octanucleotide repeats, and the tetranucleotide repeats (Figure 3a). To identify some specific and general SSRs between legume species, we have expanded our SSR analysis section to differentiate between specific SSRs unique to certain legume species and general SSRs shared across multiple species. A total of four, four, eight, eighteen, eighteen, six, and one types were recorded for mono-, di-, tri-, tetra-, octa-, nona-, and decanucleotide SSRs, respectively. Among the 30 species, the only SSRs in common were T, A, TA, and AT (Figure 3b). The remaining SSRs were specific to other species. Most SSRs were shared among the four species Medicago sativa subsp. falcata, Medicago varia, Medicago sativa, and Medicago hybrida. The SSRs shared by Thermopsis alpina and Thermopsis lanceolata were the most similar, differing only by an additional nucleotide G compared to Thermopsis lanceolata (Figure 3b and Table S3).

3.5. ENC Plot Analysis

To further examine the correlation between nucleotide composition and codon usage bias, we employed ENC plot analysis to explore the factors influencing codon usage. The ENC values for all the cpDNAs ranged from 20.00 to 61.00, with a mean value of 51.08. These values indicated that the codon usage bias (CUB) of cpDNA was weak and might be affected by a variety of factors, such as base mutations, natural selection, and other factors. The chloroplast genomes of the 30 leguminous forages presented similar patterns in terms of ENC and GC3s (Figure 4). Most of the genes were distributed away from and above the standard curve, and a small number of other genes were distributed within and around the standard curve. This distribution indicated that the codon usage bias of cpDNA was more significantly affected by selection pressure and less influenced by mutational pressure.

3.6. PR2-Plot Analysis

To investigate the impact of mutations and selective pressures on codon usage patterns within chloroplast genes, a PR2 plot was employed to analyze the utilization of three codon positions, A/T and G/C. The study revealed a significant inhomogeneity in the distribution of the scatter points across the four quadrants (Figure 5). In terms of the vertical distribution, the vast majority of the scatter was concentrated in the lower half of the graph, suggesting a preference for T over A in the A/T base at codon position 3. These characteristics underscore the prominence of the T base in codon usage among related species. In terms of horizontal distribution, most scatter points clustered in the right half of the region, suggesting a preference for G over C in the G/C base at codon position 3. The G/C base at codon position 3 was more likely to be chosen over C. Further analysis of the quadrant distribution demonstrated that quadrant 4 exhibited the densest scatter distribution, indicating a high degree of clustering. This trend clearly suggested that the selection preference for base 3 of the codon favors the combination of G and T. Moreover, the observed pattern implied that natural selection might play a significant role in determining codon usage bias.

3.7. Neutrality Plot Analysis

Neutral analyses based on GC12 and GC3 were conducted to assess the quantitative effects of mutation pressure and natural selection on codon usage (Figure 6). The results indicated that the correlation between GC12 and GC3 in the chloroplast genome was not significant, with most points distributed in small clusters, suggesting that codon preference was primarily influenced by selection effects. In the neutrality plot, the distributions of GC12 and GC3 were relatively narrow. GC12 ranged from 0.27 to 0.58, while GC3 spanned from 0.14 to 0.50. The regression coefficients of GC12 on GC3 ranged from 0.0096 to 0.3464 (close to 0). These findings suggested that mutational pressures accounted for only 0.96–34.64% of the codon usage patterns in the 30 forage genomes, with factors such as natural selection contributing to 65.36–99.04%. Thus, mutational pressure exerted a limited effect on codon bias, while natural selection played a significant, if not dominant, role.

3.8. Correspondence Analysis (COA)

In a correspondence analysis (COA) of the chloroplast genomes of 30 leguminous forages, all genes presented significant interspecies differences in codon usage bias (Figure 7). The first principal component (axis 1) and the second principal component (axis 2) accounted for 8.14–10.73% and 7.19–9.37% of the variance in codon usage changes, respectively. Among the 30 species, the majority of gene dots clustered in the central region, indicating similar CUB characteristics across these species. However, a few genes were dispersed in the periphery, reflecting divergent CUB patterns and suggesting a more pronounced codon usage bias. Further analysis indicated that interspecies differences in GC content may influence CUB. Medicago polymorpha had a lower GC content, and its gene points were more centrally distributed in the corresponding analyzed maps, whereas those of Thermopsis lanceolata showed greater dispersion. Nevertheless, variations in GC content did not substantially alter the primary trends in the correspondence analysis, implying that other factors, such as natural selection or mutation, may exert a dominant influence on codon usage bias.

3.9. Collinearity Analysis and Phylogenetic Analysis

A phylogenetic tree based on chloroplast genome covariance was constructed to elucidate the evolutionary relationships among the species included in this study (Figure 8). Each node in the tree represented a distinct species, with branch lengths indicating the evolutionary distance between them. The results revealed high similarity in chloroplast genome structure among species within the same branch, while significant variations in both length and position were observed across different branches. Among the 30 legume species, Glycine soja and Glycine max presented the highest degree of similarity. Arachis hypogaea, Stylosanthes hamata, and Stylosanthes guianensis formed a closely related group with a high degree of similarity. Notably, Thermopsis alpina and Thermopsis lanceolata chloroplast genomes showed the closest relationship and highest similarity. Although Trifolium subterraneum, Trifolium repens, and Trifolium pratense were closely related, their genome structures displayed considerable variability in length and location.

4. Discussion

The chloroplast genome is one of the most important and stable organellar genomes in plants and plays a central role in photosynthesis, energy production, and cellular metabolism [7]. The present study provides a comprehensive analysis of the chloroplast genomes of 30 leguminous forages, revealing crucial insights into their evolutionary trajectories, codon bias, and phylogenetic relationships within this important group of plants. Codons are essential for the transmission of genetic information, and the accurate recognition of codons encoding different amino acids is crucial for ensuring the correct expression of genetic information [44]. Codon bias results from multiple factors, and numerous studies have demonstrated that it is influenced by various biological elements. These include nucleotide composition [45] and mutational bias in GC composition [46]. It is also influenced by natural selection on the gene expression level [47], tRNA abundance [48], secondary structure of mRNA [49], codon hydrophilicity [48], DNA replication start and stop sites [50], gene length [51], translation accuracy selection [52], and expression level [53]. However, mutational pressure and natural selection are considered the main factors for codon bias [54].

4.1. Codon Usage Bias and Natural Selection

A significant finding of this study is the observed codon usage pattern in the chloroplast genomes of the 30 leguminous forages. The codon usage pattern is closely related to the GC content. In this study, we analyzed the GC content at the three codon positions (GC1, GC2, and GC3) within the forage chloroplast genomes to explore codon usage bias patterns. The total GC content averaged 34.93%, with the highest percentage in GC1 and lowest in GC3. These results suggest that the chloroplast genes of 30 leguminous forages prefer codons ending in A/U, which is consistent with the findings of previous studies. For instance, a previous study has demonstrated that Panicum species tends to use codons ending in A or U [55]. The ENC value, which ranges from 20 to 61, assesses the extent to which effective codon usage deviates from random selection. A smaller ENC value indicates stronger codon usage bias, while a larger value suggests weaker bias [34]. The ENC values of the chloroplast genes of the 30 leguminous forages ranged from 44.68 to 48.70, indicating that these 30 leguminous forages presented weak codon usage bias. RSCU analysis revealed highly similar codon usage biases (CUBs) in the chloroplast genomes of the 30 leguminous forage plants, with 29 common preferred codons. Most of these were best codons (RSCU > 1), with 96.55% ending with A/U. Furthermore, the most frequently used codon in this study was UUA, consistent with studies on Medicago ruthenica [26], Medicago truncatula [56] and Medicago sativa [57]. The genomes of the 30 species exhibited high similarity in terms of biased codon type and number, indicating a consistent codon usage pattern. This consistency may be closely related to the genetic background and functional requirements of plant chloroplast genomes.

4.2. GC Content and Base Composition

The GC content plays a crucial role in DNA stability and structure, influencing processes such as replication, transcription, and translation. The GC content of the Thermopsis alpina chloroplast genome is 36.46%, which is close to that of Thermopsis lanceolata and Thermopsis turkestanica [2,58]. This relatively low GC content observed in these leguminous species may represent an evolutionary adaptation to their environmental conditions, potentially enhancing chloroplast DNA replication speed or protein synthesis efficiency. Correspondence analysis further confirmed the impact of base composition on codon usage bias, indicating that GC content significantly influenced the observed codon usage patterns in these species [59].
This conclusion is corroborated by the effective number of codons (ENC) analysis, which revealed a distinct pattern of codon usage bias in the examined leguminous species analyzed. The ENC values, providing a quantitative measure of codon bias, indicated that the chloroplast genomes of these plants exhibit relatively weaker codon bias than those of other plant species, with more pronounced biases, such as those observed in monocots [60]. In terms of the source of codon variation, our results further suggest that both mutational pressure and natural selection have affected the chloroplast genomes of 30 legume forages. ENC plot and PR2 plot analyses revealed that CUB in 30 leguminous forages was affected by both mutational pressure and natural selection. PR2 plot analysis revealed that the frequency of use between A/T and G/C at codon position 3 in 30 leguminous forages was not uniform in terms of the number of genes distributed among the four quadrants. Most genes were located vertically below the midline, and horizontally, the number of genes to the right of the midline was greater than the number of genes to the left of the midline. As a result, the frequencies of the codon’s third positions, G and T, were higher than the frequencies of C and A. Natural selection emerged as the primary factor influencing codon usage bias in the chloroplast genomes of nine leguminous forages [27], a finding consistent with results observed in other legume forages [56,61].
The neutral theory of molecular evolution suggests that base mutations and natural selection have a neutral or near-neutral effect on codon third positions [62]. In the neutrality plot, COA of the three codon sites indicates that when mutational pressure is dominant, the trend of base composition changes should be similar across all three sites. In contrast, when natural selection plays a dominant role, there is no correlation between the three codon sites [54]. A previous study has shown that natural selection plays an important role in codon usage in chloroplast genes, but the strength of the effect varies among populations [63]. In addition, other studies have demonstrated natural selection is a major driving force in legume Medicago truncatula CUB [56]. In this study, GC12 showed no significant correlation with GC3, with a slope approaching zero. Mutational pressure accounted for only 0.96–34.64% of the 30 forage genome codon usage patterns, and natural selection and other factors accounted for 65.36–99.04%. This finding demonstrates that in the chloroplast genomes of 30 leguminous forages, codon usage bias is determined mainly by natural selection, whereas mutational pressure also plays a role, which is in agreement with the findings of a previous study [63]. COA analyses revealed that despite some differences in synonymous codon usage in the chloroplast genomes of these species, the gene expression patterns of each species were highly consistent. This finding is in agreement with a previous report on nine chloroplast genes in Gynostemma species [64]. Further investigation is required to explore additional factors associated with codon usage bias, such as gene expression length and RNA structure.

4.3. SSRs and Genetic Diversity

A notable finding of this study is the distribution of SSRs across the chloroplast genomes of the 30 leguminous species examined. We observed a greater abundance of SSRs in intergenic regions than in coding sequences or introns (Table S3), a trend that is typical of many plant chloroplast genomes. Chloroplast SSR copy number variation is an important molecular marker that is widely used in plant population genetic diversity and phylogenetic studies because of its greater taxonomic distance than nuclear and mitochondrial SSRs [65,66]. An analysis of the chloroplast genomes of 30 leguminous forages revealed that SSRs had a significantly greater proportion of repetitive sequences in the IGS than in the coding sequence (CDS) region, which is similar to that previously reported for the chloroplast genomes of other plants in Veratrum nigrum L. [67]. The high frequency of SSRs in intergenic regions could be indicative of the dynamic nature of these regions, which may evolve rapidly in response to environmental pressures or evolutionary needs [68]. This dynamic evolution of intergenic regions, driven in part by the accumulation of SSRs, could contribute to the adaptability of leguminous forages in diverse ecosystems [69]. The diversity in SSRs could therefore be indicative of the adaptability of leguminous forages to varying environmental pressures. Furthermore, SSR markers could be employed in future studies to investigate genetic diversity and population structure among leguminous forage species, which is critical for breeding programs and conservation efforts.

4.4. Phylogenetic Analysis and Evolutionary Relationships

Phylogenetic analysis utilizing chloroplast genome sequences has demonstrated its efficacy as a robust tool for elucidating evolutionary relationships among plant species [70]. In this study, collinearity and phylogenetic tree analysis highlighted the structural similarities among closely related species within leguminous forages. These results support the hypothesis that the chloroplast genomes of leguminous species are highly conserved, which is consistent with findings in other plant groups [71,72]. The high degree of collinearity observed among species suggests a relatively slow rate of evolutionary change in the chloroplast genome, which is typical for organellar genomes, as they are subject to strong selective constraints owing to their critical role in energy production and metabolic processes [73]. The phylogenetic tree constructed on the basis of chloroplast genome data also revealed distinct clades within the leguminous species, which can be further explored to understand the evolutionary trajectories of different subgroups. These clades likely correspond to ancient divergences in the Fabaceae family, with different species having adapted to specific ecological niches and environmental conditions over time. For instance, species from different subfamilies or tribes within the Fabaceae family may display unique adaptations in their chloroplast genomes, warranting further investigation to elucidate the molecular basis of these adaptations [74]. This comparative approach facilitates a deeper understanding of the genomic characteristics and phylogeny of legume forage plants.

5. Conclusions

This research sequenced and assembled the chloroplast genome of Thermopsis alpina, and performed an in-depth analysis of the chloroplast genomes for 30 leguminous forage species, with Thermopsis alpina included, providing valuable insights into their evolutionary patterns, codon usage bias, and phylogenetic relationships. The results showed that the Thermopsis alpina chloroplast genome has a typical quadripartite structure, with a total length of 153,714 bp, encoding a total of 124 genes. The RSCU analysis revealed 29 preferred codons, predominantly in noncoding regions. The codon bias of the chloroplast genomes of 30 leguminous forages was affected by a combination of natural selection and mutational pressure, with natural selection playing a dominant role, as shown by the analysis of the neutrality plot, PR2 plot, and ENC plot. Correspondence analysis revealed that base composition shapes codon usage bias to some extent. However, different leguminous forage plants affect the chloroplast genome to different degrees. Additionally, SSR analysis highlighted the abundance of SSRs in intergenic regions, reinforcing their potential role in genome evolution and stability. Phylogenetic tree construction revealed clear clustering of species based on their genetic similarities, offering further support for the evolutionary relatedness of these legumes. Further investigations into the molecular mechanisms driving codon bias, as well as the functional implications of SSRs and other genomic features, will be crucial for enhancing the utilization of these plants in agriculture and ecosystem management. Overall, this research may provide a reference for the biological evolution and phylogeny of leguminous forage, and for understanding the patterns of codons in the chloroplast genomes of other plants.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/agronomy15040765/s1. Table S1. Chloroplast genome characteristics. Table S2. The summary of RSCU values of codons in chloroplast genomes of 30 leguminous forage. Table S3. Simple sequence repeats (SSRs).

Author Contributions

Conceptualization, T.Z.; methodology, Y.X.; software, X.H.; validation, Y.X. and R.Y.; formal analysis, R.Y.; data curation, Y.X.; writing—original draft preparation, R.Y.; writing—review and editing, R.Y., X.H., Y.X. and T.Z.; project administration, T.Z.; funding acquisition, T.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Breeding and Industrialization Demonstration of New High-quality Alfalfa Varieties (No. 2022JBGS0020) and the Breeding of New Alfalfa Varieties (SJCZFY2022-3 to T.Z.).

Data Availability Statement

Upon reasonable request, the datasets used and/or analyzed in this study are available from the corresponding author.

Acknowledgments

We would like to express our sincere gratitude to Gai Yunpeng for his invaluable technical support and guidance throughout this research.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

  1. Phelan, P.; Moloney, A.P.; McGeough, E.J.; Humphreys, J.; Bertilsson, J.; Riordan, E.G.O.; Kiely, P.O. Forage Legumes for Grazing and Conserving in Ruminant Production Systems. Crit. Rev. Plant Sci. 2015, 34, 281–326. [Google Scholar]
  2. Jiao, P.P.; Xi, J.; Qu, W.R.; Zhang, S.H.; Yang, T.G.; Wu, Z.H. Complete chloroplast genome sequence of Thermopsis turkestanica Gand. (Leguminosae). Mitochondrial DNA Part B. Resour. 2021, 6, 335–336. [Google Scholar]
  3. Yu, X.C.; Zhu, H.Y. Enacting partner specificity in legume–rhizobia symbioses. aBIOTECH 2024, 1–17. [Google Scholar] [CrossRef]
  4. Mielmann, A. The utilisation of lucerne (Medicago sativa): A review. Br. Food. J. 2013, 115, 590–600. [Google Scholar]
  5. McKenna, P.; Cannon, N.; Conway, J.; Dooley, J. The use of red clover (Trifolium pratense) in soil fertility-building: A Review. Field Crops Res. 2018, 221, 38–49. [Google Scholar]
  6. Soumare, A.; Diedhiou, G.A.; Thuita, M.; Hafidi, M.; Ouhdouch, Y.; Gopalakrishnan, S.; Kouisni, L. Exploiting Biological Nitrogen Fixation: A Route Towards a Sustainable Agriculture. Plants 2020, 9, 1011. [Google Scholar] [CrossRef]
  7. Zhang, Y.; Nie, X.; Jia, X.; Zhao, C.; Biradar, S.S.; Wang, L.; Du, X.; Weining, S. Analysis of codon usage patterns of the chloroplast genomes in the Poaceae family. Aust. J. Bot. 2012, 60, 461–470. [Google Scholar]
  8. Dobrogojski, J.; Adamiec, M.; Luciński, R. The chloroplast genome: A review. Acta Physiol. Plant 2020, 42, 155–167. [Google Scholar] [CrossRef]
  9. Rolland, N.; Curien, G.; Finazzi, G.; Kuntz, M.; Maréchal, E.; Matringe, M.; Ravanel, S.; Seigneurin-Berny, D. The Biosynthetic Capacities of the Plastids and Integration Between Cytoplasmic and Chloroplast Processes. Annu. Rev. Genet. 2012, 46, 233–264. [Google Scholar] [CrossRef]
  10. Joyard, J.; Ferro, M.; Masselon, C.; Seigneurin-Berny, D.; Salvi, D.; Garin, J.; Rolland, N. Chloroplast proteomics highlights the subcellular compartmentation of lipid metabolism. Prog Lipid Res. 2009, 49, 128–158. [Google Scholar]
  11. Kwak, S.Y.; Lew, T.T.S.; Sweeney, C.J.; Koman, V.B.; Wong, M.H.; BohmertTatarev, K.; Snell, K.D.; Seo, J.S.; Chua, N.H.; Strano, M.S. Chloroplast-selective gene delivery and expression in planta using chitosan-complexed single-walled carbon nanotube carriers. Nat. Nanotechnol. 2019, 14, 447–455. [Google Scholar] [CrossRef] [PubMed]
  12. Tsuneyoshi, K. Review of cytological studies on cellular and molecular mechanisms of uniparental (maternal or paternal) inheritance of plastid and mitochondrial genomes induced by active digestion of organelle nuclei (nucleoids). J. Plant Res. 2010, 123, 207–230. [Google Scholar]
  13. Wolfe, K.H.; Li, W.H.; Sharp, P.M. Rates of nucleotide substitution vary greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. Natl. Acad. Sci. USA 1987, 84, 9054–9058. [Google Scholar] [CrossRef]
  14. Li, Q.J.; Su, N.; Zhang, L.; Tong, R.C.; Zhang, X.H.; Wang, J.R.; Chang, Z.Y.; Zhao, L.; Daniel, P. Chloroplast genomes elucidate diversity, phylogeny, and taxonomy of Pulsatilla (Ranunculaceae). Sci. Rep. 2020, 10, 19781. [Google Scholar]
  15. Zhou, J.; Chen, X.; Cui, Y.; Sun, W.; Li, Y.; Wang, Y.; Song, J.; Yao, H. Molecular Structure and Phylogenetic Analyses of Complete Chloroplast Genomes of Two Aristolochia Medicinal Species. Int. J. Mol. Sci. 2017, 18, 1839. [Google Scholar] [CrossRef]
  16. Andrea, W.J.D.; Theroux, S.; Bradley, R.S.; Huang, X. Does phylogeny control [formula omitted]-temperature sensitivity? Implications for lacustrine alkenone paleothermometry. Geochim. Cosmochim. Acta 2016, 175, 168–180. [Google Scholar]
  17. Deng, N.X.; Sun, Z.X.; Meng, J.; Zhao, Y. Comparison of chloroplast genome characteristics and phylogenetic analysis of five Medicago L. species in M. sect. Medicago. Grassl. Turf. 2024, in press.
  18. Wang, X.J.; Dong, W.P.; Zhou, S.L. The Evolution Path of Medicago in China based on the Chloroplast Genome Analysis. Acta Ecol. Sin. 2022, 42, 6125–6136. [Google Scholar]
  19. Wang, Z.; Cai, Q.; Wang, Y.; Li, M.; Wang, C.; Wang, Z.; Jiao, C.; Xu, C.; Wang, H.; Zhang, Z. Comparative Analysis of Codon Bias in the Chloroplast Genomes of Theaceae Species. Front. Genet. 2022, 13, 824610. [Google Scholar] [CrossRef]
  20. Mads, M.; Eduardo, V.; Antonio, V.; Martin, W.B. Differential expression of the three independent CaM genes coding for an identical protein: Potential relevance of distinct mRNA stability by different codon usage. Cell Calcium 2022, 107, 102656. [Google Scholar]
  21. Chaney, L.J.; Clark, L.P. Roles for Synonymous Codon Usage in Protein Biogenesis. Annu. Rev. Biophys. 2015, 44, 143–166. [Google Scholar]
  22. Fran, S. The Code of Silence: Widespread Associations Between Synonymous Codon Biases and Gene Function. J. Mol. Evol. 2016, 82, 65–73. [Google Scholar]
  23. Plotkin Joshua, B.; Kudla, G. Synonymous but not the same: The causes and consequences of codon bias. Nat. Rev. Genet. 2011, 12, 32–42. [Google Scholar] [PubMed]
  24. Thankeswaran, S.P.; Varatharajalu, U.; Vijaipal, B. Codon usage bias. Mol. Biol. Rep. 2021, 49, 539–565. [Google Scholar]
  25. Arella, D.; Dilucca, M.; Giansanti, A. Codon usage bias and environmental adaptation in microbial organisms. Mol. Genet. 2021, 296, 751–762. [Google Scholar]
  26. Tian, C.Y.; Wu, Z.N.; Li, X.S.; Zhiyong, L. Codon Usage Bias of Chloroplast Genome in Medicago ruthenica. Acta Agrestia Sin. 2021, 29, 2678–2684. [Google Scholar]
  27. Xiao, M.; Hu, X.; Li, Y.; Liu, Q.; Shen, S.; Jiang, T.; Zhang, L.; Zhou, Y.; Li, Y.; Luo, X.; et al. Comparative analysis of codon usage patterns in the chloroplast genomes of nine forage legumes. Physiol. Mol. Biol. Plants. 2024, 30, 153–166. [Google Scholar]
  28. Gao, S.Y.; LI, Y.M.; Yang, Z.Q.; Dong, K.H.; Xia, F.S. Codon usage bias analysis of the chloroplast genome of Bothriochloa ischaemum. Acta Prataculturae Sin. 2023, 32, 85–95. [Google Scholar]
  29. Jin, J.J.; Yu, W.B.; Yang, J.B.; Song, Y.; dePamphilis, C.W.; Yi, T.S.; Li, D.Z. Getorganelle: A fast and versatile toolkit for accurate denovo assembly of organelle genomes. Genome Biol. 2020, 21, 241. [Google Scholar]
  30. Dierckxsens, N.; Mardulyn, P.; Smits, G. NOVOPlasty: De novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 2017, 45, e18. [Google Scholar]
  31. Shi, L.; Chen, H.; Jiang, M.; Wang, L.; Wu, X.; Huang, L.; Liu, C. Cpgavas2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Res. 2019, 47, W65–W73. [Google Scholar]
  32. Kashi, Y.; King, D.G. Simple sequence repeats as advantageous mutators in evolution. Trends Genet. 2006, 22, 253–259. [Google Scholar] [PubMed]
  33. Beier, S.; Thiel, T.; Münch, T.; Scholz, U.; Mascher, M. MISA-web: A web server for microsatellite prediction. Bioinformatics 2017, 33, 2583–2585. [Google Scholar]
  34. Wright, F. The ’effective number of codons’ used in a gene. Gene 1990, 87, 23–29. [Google Scholar] [PubMed]
  35. Sueoka, N. Near homogeneity of PR2-bias fingerprints in the human genome and their implications in phylogenetic analyses. J. Mol. Evol. 2001, 53, 469–476. [Google Scholar]
  36. Xiang, H.; Zhang, R.Z.; Butler, R.R.; Liu, T.; Zhang, L.; Pombert, J.F.; Zhou, Z.Y. Comparative Analysis of Codon Usage Bias Patterns in Microsporidian Genomes. PLoS ONE 2015, 10, e129223. [Google Scholar]
  37. He, B.; Dong, H.; Jiang, C.; Cao, F.L.; Tao, S.T.; Xu, L.A. Analysis of codon usage patterns in Ginkgo biloba reveals codon usage tendency from A/U-ending to G/C-ending. Sci. Rep. 2016, 6, 35927. [Google Scholar]
  38. Sueoka, N. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 1988, 85, 2653–2657. [Google Scholar]
  39. Romero, H.; Zavala, A.; Musto, H.; Bernardi, G. The influence of translational selection on codon usage in fishes from the family Cyprinidae. Gene 2003, 317, 141–147. [Google Scholar]
  40. Wang, H.C.; Hickey, D. Rapid divergence of codon usage patterns within the rice genome. BMC Evol. Biol. 2007, 7, S6. [Google Scholar]
  41. Ying, C.; Weicai, Y.; Yongdong, Z.; Yuesheng, X. High speed BLASTN: An accelerated MegaBLAST search tool. Nucleic. Acids. Res. 2015, 43, 7762–7768. [Google Scholar]
  42. Kazutaka, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 30, 772–780. [Google Scholar]
  43. Guy, L.; Roat Kultima, J.; Andersson, S.G. genoPlotR: Comparative gene and genome visualization in R. Bioinformatics 2010, 26, 2334–2335. [Google Scholar] [PubMed]
  44. Koonin, V.E.; Novozhilov, S.A. Origin and Evolution of the Universal Genetic Code. Annu. Rev. Genet. 2017, 51, 45–62. [Google Scholar]
  45. Li, Y.C.; Korol, A.B.; Fahima, T.; Beiles, A.; Nevo, E. Microsatellites: Genomic distribution, putative functions and mutational mechanisms: A review. Mol. Ecol. 2002, 11, 2453–2465. [Google Scholar]
  46. Sueoka, N.; Kawanishi, Y. DNA G+C content of the third codon position and codon usage biases of human genes. Gene 2000, 261, 53–62. [Google Scholar]
  47. Blake, W.J.; Kaern, M.; Cantor, C.R.; Collins, J.J. Noise in eukaryotic gene expression. Nature 2003, 422, 633–637. [Google Scholar]
  48. Olejniczak, M.; Uhlenbeck, O.C. tRNA residues that have coevolved with their anticodon to ensure uniform and accurate codon recognition. Biochimie 2006, 88, 943–950. [Google Scholar]
  49. Gu, W.J.; Zhou, T.; Ma, J.M.; Sun, X.; Lu, Z.H. Folding type specific secondary structure propensities of synonymous codons. IEEE Trans. Nanobiosci. 2003, 2, 150–157. [Google Scholar]
  50. Deschavanne, P.; Filipski, J. Correlation of GC content with replication timing and repair mechanisms in weakly expressed E. coli genes. Nucleic. Acids. Res. 1995, 23, 1350–1353. [Google Scholar]
  51. Zheng, S.; Liang, M.; Robert, M.; Xiansheng, Z.; Dawei, H. Analysis of codon usage on Wolbachia pipientis w Mel genome. Sci. Sin. 2009, 39, 948–953. [Google Scholar]
  52. Huang, Y.; Koonin, E.V.; Lipman, D.J.; Przytycka, T.M. Selection for minimization of translational frameshifting errors as a factor in the evolution of codon usage. Nucleic. Acids. Res. 2009, 37, 6799–6810. [Google Scholar] [CrossRef]
  53. Hiraoka, Y.; Kawamata, K.; Haraguchi, T.; Chikashige, Y. Codon usage bias is correlated with gene expression levels in the fission yeast Schizosaccharomyces pombe. Genes Cells 2009, 14, 499–509. [Google Scholar] [CrossRef]
  54. Sharp, P.M.; Emery, L.R.; Zeng, K. Forces that influence the evolution of codon bias. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 2010, 365, 1203–1212. [Google Scholar] [CrossRef] [PubMed]
  55. Li, G.; Zhang, L.; Xue, P. Codon usage pattern and genetic diversity in chloroplast genomes of Panicum species. Gene 2021, 802, 145866. [Google Scholar] [CrossRef] [PubMed]
  56. Song, H.; Liu, J.; Chen, T.; Nan, Z.B. Synonymous codon usage pattern in model legume Medicago truncatula. J. Integr. Agric. 2018, 17, 2074–2081. [Google Scholar] [CrossRef]
  57. Tao, X.L.; Ma, L.; Zhang, Z.S.; Liu, W.X.; Liu, Z.P. Characterization of the complete chloroplast genome of alfalfa (Medicago sativa) (Leguminosae). Gene Rep. 2017, 6, 667–673. [Google Scholar] [CrossRef]
  58. Dorjee, T.; Gao, F.; Zhou, Y.J. The complete chloroplast genome of Thermopsis lanceolata: Genome structure and its phylogenetic relationships within the family Fabaceae. Mitochondrial DNA Part B. Resour. 2022, 7, 2076–2080. [Google Scholar] [CrossRef]
  59. Zhang, Y.; Shen, Z.N.; Meng, X.R.; Zhang, L.M.; Liu, Z.G.; Liu, M.J.; Zhang, F.; Zhao, J. Codon usage patterns across seven Rosales species. BMC Plant Biol. 2022, 22, 65. [Google Scholar] [CrossRef]
  60. Yang, X.M.; Wang, Y.; Gong, W.X.; Li, Y.X. Comparative Analysis of the Codon Usage Pattern in the Chloroplast Genomes of Gnetales Species. Int. J. Mol. Sci. 2024, 25, 10622. [Google Scholar] [CrossRef]
  61. Bhattacharyya, D.; Uddin, A.; Das, S.K.; Chakraborty, S. Mutation pressure and natural selection on codon usage in chloroplast genes of two species in Pisum L. (Fabaceae: Faboideae). Mitochondrial DNA A DNA Mapp Seq Anal. 2019, 30, 664–673. [Google Scholar] [CrossRef]
  62. Sharp, P.M.; Stenico, M.; Peden, J.F.; Lloyd, A.T. Codon usage: Mutational bias, translational selection, or both? Biochem. Soc. Trans. 1993, 21, 835–841. [Google Scholar] [PubMed]
  63. Morton, B.R. Selection on the codon bias of chloroplast and cyanelle genes in different plant and algal lineages. J. Mol. Evol. 1998, 46, 449–459. [Google Scholar]
  64. Zhang, P.; Xu, W.; Lu, X.; Wang, L. Analysis of codon usage bias of chloroplast genomes in Gynostemma species. Physiol. Mol. Biol. Plants. 2021, 27, 2727–2737. [Google Scholar] [PubMed]
  65. Pugh, T.; Fouet, O.; Risterucci, A.M.; Brottier, P.; Abouladze, M.; Deletrez, C.; Courtois, B.; Clement, D.; Larmande, P.; N’Goran, J.A.K.; et al. A new cacao linkage map based on codominant markers: Development and integration of 201 new microsatellite markers. TAG Theor. Appl. Genet. 2004, 108, 1151–1161. [Google Scholar] [PubMed]
  66. Powell, W.; Morgante, M.; McDevitt, R.; Vendramin, G.G.; Rafalski, J.A. Polymorphic simple sequence repeat regions in chloroplast genomes: Applications to the population genetics of pines. Proc. Natl. Acad. Sci. USA 1995, 92, 7759–7763. [Google Scholar]
  67. Zhang, Y.M.; Han, L.J.; Yang, C.W.; Yin, Z.L.; Tian, X.; Qian, Z.G.; Li, G.D. Comparative chloroplast genome analysis of medicinally important Veratrum (Melanthiaceae) in China: Insights into genomic characterization and phylogenetic relationships. Plant Divers. 2022, 44, 70–82. [Google Scholar]
  68. Li, Y.C.; Korol, A.B.; Fahima, T.; Nevo, E. Microsatellites within genes: Structure, function, and evolution. Mol. Biol. Evol. 2004, 21, 991–1007. [Google Scholar]
  69. Smýkal, P.; Coyne, C.J.; Ambrose, M.J.; Maxted, N.; Schaefer, H.; Blair, M.W.; Berger, J.; Greene, S.L.; Nelson, M.N.; Besharat, N.; et al. Legume Crops Phylogeny and Genetic Diversity for Science and Breeding. Crit. Rev. Plant Sci. 2015, 34, 43–104. [Google Scholar]
  70. Bartolomé, S. Evolution and Function of the Chloroplast. Current Investigations and Perspectives. Int. J. Mol. Sci. 2018, 19, 3095. [Google Scholar] [CrossRef]
  71. Li, C.C.; Bao, Y.; Hong, T.; Li, J.C.; Yao, M.Z.; Wang, N.; Wu, X.M.; Xie, K.D.; Zhou, Y.F.; Guo, W.W. Insights into chloroplast genome evolution in Rutaceae through population genomics. Hortic. Adv. 2024, 2, 13. [Google Scholar]
  72. Jiang, D.; Cai, X.; Gong, M.; Xia, M.; Xing, H.; Dong, S.; Tian, S.; Li, J.; Lin, J.; Liu, Y.; et al. Correction: Complete chloroplast genomes provide insights into evolution and phylogeny of Zingiber (Zingiberaceae). BMC Genom. 2023, 24, 397. [Google Scholar]
  73. Hu, J.X.; Yan, D.L.; Yuan, H.W.; Zhang, J.H.; Zheng, B.S. Comparative analysis of chloroplast genomes in ten holly (Ilex) species: Insights into phylogenetics and genome evolution. BMC Ecol. Evol. 2024, 24, 133. [Google Scholar]
  74. Feng, Y.; Gao, X.F.; Zhang, J.Y.; Jiang, L.S.; Li, X.; Deng, H.N.; Liao, M.; Xu, B. Complete Chloroplast Genomes Provide Insights Into Evolution and Phylogeny of Campylotropis (Fabaceae). Front Plant Sci. 2022, 13, 895543. [Google Scholar] [PubMed]
Figure 1. Annotation map of the cp. genome structure of Thermopsis alpina. The first track shows the dispersed repeats, which consist of direct (D) and palindromic (P) repeats connected with red and green arcs. The second track shows the long tandem repeats as short blue bars. The third track shows the short tandem repeats or microsatellite sequences as short bars with different colors. The fourth track displays the small single-copy (SSC), inverted repeat (IRa and IRb), and large single-copy (LSC) regions. The GC content along the genome is plotted on the fifth track. The genes are shown on the sixth track. The transcription directions for the inner and outer genes are clockwise and anticlockwise, respectively.
Figure 1. Annotation map of the cp. genome structure of Thermopsis alpina. The first track shows the dispersed repeats, which consist of direct (D) and palindromic (P) repeats connected with red and green arcs. The second track shows the long tandem repeats as short blue bars. The third track shows the short tandem repeats or microsatellite sequences as short bars with different colors. The fourth track displays the small single-copy (SSC), inverted repeat (IRa and IRb), and large single-copy (LSC) regions. The GC content along the genome is plotted on the fifth track. The genes are shown on the sixth track. The transcription directions for the inner and outer genes are clockwise and anticlockwise, respectively.
Agronomy 15 00765 g001
Figure 2. Heatmap analysis of relative synonymous codon usage (RSCU) analysis of each amino acid in the 30 leguminous forages. The x-axis represents kinds of amino acids, while the y-axis represents different species. The shade of color transitions from gray to dark blue, signifying a rise in RSCU values, where darker hues represent higher values.
Figure 2. Heatmap analysis of relative synonymous codon usage (RSCU) analysis of each amino acid in the 30 leguminous forages. The x-axis represents kinds of amino acids, while the y-axis represents different species. The shade of color transitions from gray to dark blue, signifying a rise in RSCU values, where darker hues represent higher values.
Agronomy 15 00765 g002
Figure 3. Analysis of simple sequence repeat (SSR) sequences chloroplast genomes of in 30 leguminous forages revealed the following. The x-axis represents different kinds of SSRs, while the y-axis represents different species. (a) Distribution of SSR types across various species. The stacked bar chart represents the number of SSR types (MonoSSR, DiSSR, TriSSR, TetraSSR, PentaSSR, HexaSSR, HeptaSSR, OctaSSR, NonaSSR, and DecaSSR) detected in each species. The different colors indicate the various SSR types, ranging from MonoSSR to DecaSSR. The species names are listed along the y-axis, and the number of SSRs detected for each type is represented by the length of the colored sections within each bar. (b) Analysis of specific and general SSRs between the 30 chloroplast genomes. The x-axis represents type of SSRs, while the y-axis represents different species.
Figure 3. Analysis of simple sequence repeat (SSR) sequences chloroplast genomes of in 30 leguminous forages revealed the following. The x-axis represents different kinds of SSRs, while the y-axis represents different species. (a) Distribution of SSR types across various species. The stacked bar chart represents the number of SSR types (MonoSSR, DiSSR, TriSSR, TetraSSR, PentaSSR, HexaSSR, HeptaSSR, OctaSSR, NonaSSR, and DecaSSR) detected in each species. The different colors indicate the various SSR types, ranging from MonoSSR to DecaSSR. The species names are listed along the y-axis, and the number of SSRs detected for each type is represented by the length of the colored sections within each bar. (b) Analysis of specific and general SSRs between the 30 chloroplast genomes. The x-axis represents type of SSRs, while the y-axis represents different species.
Agronomy 15 00765 g003
Figure 4. ENC plot analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The red dotted line indicates the reference line of theoretical values, representing the theoretical relationship between GC3s and ENCs. Larger points denote higher ENC values. Larger points denote higher ENC values. Each facet corresponds to one species; refer to Table 1 for species details.
Figure 4. ENC plot analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The red dotted line indicates the reference line of theoretical values, representing the theoretical relationship between GC3s and ENCs. Larger points denote higher ENC values. Larger points denote higher ENC values. Each facet corresponds to one species; refer to Table 1 for species details.
Agronomy 15 00765 g004
Figure 5. PR2-polt analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The blue dotted lines in the figure indicate the theoretical median values of the G3 and A3 ratios, respectively (0.5). Larger points denote higher ENC values. Each facet corresponds to one species; refer to Table 1 for species details.
Figure 5. PR2-polt analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The blue dotted lines in the figure indicate the theoretical median values of the G3 and A3 ratios, respectively (0.5). Larger points denote higher ENC values. Each facet corresponds to one species; refer to Table 1 for species details.
Agronomy 15 00765 g005
Figure 6. Neutrality plot analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The blue dotted line represents the fitted line of the regression model and the red dotted line represents the theoretical Y=X line. Larger points denote higher ENC values. Each facet corresponds to one species; refer to Table 1 for species details.
Figure 6. Neutrality plot analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The blue dotted line represents the fitted line of the regression model and the red dotted line represents the theoretical Y=X line. Larger points denote higher ENC values. Each facet corresponds to one species; refer to Table 1 for species details.
Agronomy 15 00765 g006
Figure 7. Correspondence analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The horizontal and vertical lines crossing at the origin (0,0) represent the neutral axes, dividing the plot into four quadrants. Each facet corresponds to one species; refer to Table 1 for species details.
Figure 7. Correspondence analysis of the chloroplast genomes of 30 leguminous forages. Different colors represent different species. The horizontal and vertical lines crossing at the origin (0,0) represent the neutral axes, dividing the plot into four quadrants. Each facet corresponds to one species; refer to Table 1 for species details.
Agronomy 15 00765 g007
Figure 8. Collinearity analysis and phylogenetic tree of the chloroplast genomes of 30 leguminous forages. Each point represents a chloroplast gene, and the genes are color-coded according to their gene type. The color of the arrows corresponds to distinct functional categories of the genes.
Figure 8. Collinearity analysis and phylogenetic tree of the chloroplast genomes of 30 leguminous forages. Each point represents a chloroplast gene, and the genes are color-coded according to their gene type. The color of the arrows corresponds to distinct functional categories of the genes.
Agronomy 15 00765 g008
Table 1. Chloroplast genome information and toxicity attributes of 30 leguminous forages.
Table 1. Chloroplast genome information and toxicity attributes of 30 leguminous forages.
No.Species NameAccession NumbersGenome SizeCDS NumberToxic
1Trifolium subterraneumEU849487144,76369N
2Lupinus luteusKC695666151,89474Y
3Medicago truncatula f. tricyclaKF241982123,35567N
4Glycine sojaKF611800152,21777N
5Arachis hypogaeaKJ468094156,39575N
6Lathyrus odoratusKJ850237120,28970Y
7Medicago hybridaKJ850240125,20870N
8Vicia sativaKJ850242122,46770Y
9Medicago sativaKU321683128,57470N
10Stylosanthes hamataMG735673156,50276N
11Melilotus albusMH191352127,20572N
12Sesbania cannabinaMN105118153,97872N
13Clitoria ternateaMN709849151,67376Y
14Thermopsis lanceolataMN841458151,52678Y
15Trifolium repensMT506238132,42970N
16Lotus corniculatusMT528596150,70072Y
17Onobrychis viciifoliaMT528597122,10271N
18Astragalus laxmanniiMT786136122,84470Y
19Stylosanthes guianensisMZ382862156,76376N
20Astragalus sinicusOM287552123,83071Y
21Medicago x variaOM362847125,69871N
22Medicago sativa subsp. falcataOM681369125,40671N
23Medicago lupulinaOM681370124,10772N
24Galega orientalisON262206125,28067Y
25Melilotus officinalisON729392126,94672N
26Medicago polymorphaOP589400124,16371N
27Trifolium pratenseOQ694003134,37065Y
28Glycine maxOR664111152,22677N
29Thermopsis turkestanicaMN841459153,53876Y
30Thermopsis alpinaPV262343153,71480Y
Note: Y indicates toxicity, N indicates no toxicity.
Table 2. Genes annotated in the cp. genomes of Thermopsis alpina.
Table 2. Genes annotated in the cp. genomes of Thermopsis alpina.
Category of GenesGroup of GenesGene Name
Genes for photosynthesisSubunits of photosystem I (5)psaA, psaB, psaC, psaI, psaJ
Subunits of photosystem II (14)psbA, psbB, psbC, psbD, psbE, psbF, psbI, psbJ, psbK, psbH, psbM, psbN, psbT, psbZ
Subunits of cytochrome (6)petA, petB, petD, petG, petL, petN
Subunits of ATP synthase (6)atpA, atpB, atpE, atpF *, atpH, atpI
Subunits of NADH-dehydrogenase (12)ndhA *, ndhB *a, ndhC, ndhD a, ndhE, ndhF, ndhH, ndhI, ndhJ, ndhK
Subunit of rubisco (1)rbcL
Transcription and
translation
Large subunit of ribosome (10)rpl14, rpl16, rpl2 *a, rpl20, rpl23 a, rpl32, rpl33, rpl36
DNA-dependent RNA polymerase (4)rpoA, rpoB, rpoC1 *, rpoC2
Small subunit of ribosome (12)rps11, rps12, rps14, rps15, rps16, rps18, rps19, rps3, rps4, rps7 a, rps8
rRNA Genes (8)rrn16S a, rrn23S a, rrn4.5S a, rrn5S a
tRNA Genes (36)trnA-UGC *a, trnC-GCA, trnD-GUC, trnE-UUC *, trnF-GAA, trnG-GCC, trnG-UCC, trnH-GUG, trnI-GAU, trnK-UUU *, trnL-CAA a, trnL-UAA *, trnL-UAG, trnM-CAU a, trnfM-CAU, trnN-GUU a, trnP-UGG, trnQ-UUG, trnR-ACG a, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-CGU *, trnT-GGU, trnT-UGU, trnV-UAC *, trnV-GAC a, trnW-CCA, trnY-GUA
Other genesSubunit of Acetyl-CoA-carboxylase (1)accD
c-type cytochrom synthesis gene (1)ccsA
Envelop membrane protein (1)cemA
Protease (1)clpP **
Maturase (1)matK
Genes of unknown functionConserved open reading frames (5)ycf1, ycf2 a, ycf3 **, ycf4
Note: * indicates that the gene contains one intron, ** indicates that the gene contains two introns, a represents repeat genes (genes in the IR region).
Table 3. Chloroplast genome codon base composition and characteristics of 30 leguminous forages.
Table 3. Chloroplast genome codon base composition and characteristics of 30 leguminous forages.
No.Species NameTotal Length (bp)Total GC (%)A
(%)
T
(%)
C
(%)
G
(%)
GC1 (%)GC2 (%)GC3 (%)GC3s
(%)
Average ENC
1Trifolium subterraneum144,76334.4031.6733.9317.7416.6645.5537.6028.6325.6946.61
2Lupinus luteus151,89436.6131.3732.0218.3518.2644.7337.5729.5526.6347.33
3Medicago truncatula f. tricycla123,35534.0232.6633.3217.2216.8045.2337.1126.8223.8144.71
4Glycine soja152,21735.3832.3632.2617.4017.9744.1336.8427.4824.4746.00
5Arachis hypogaea156,39536.3731.8531.7818.0418.3344.6737.7029.6526.7147.47
6Lathyrus odoratus120,28935.1632.3432.5117.4617.7045.4037.5427.9924.9246.03
7Medicago hybrida125,20833.8233.0833.1016.3817.4445.1837.326.7823.8045.26
8Vicia sativa122,46735.1532.3932.4618.1117.0445.4137.4127.8224.8546.57
9Medicago sativa128,57434.3932.8132.8116.5817.8145.2737.3026.8423.8645.11
10Stylosanthes hamata156,50236.5831.7431.6818.1718.4144.8037.6930.1127.1748.70
11Melilotus albus127,20533.6133.2733.1216.2917.3345.0637.3226.7723.8145.54
12Sesbania cannabina153,97835.6131.8632.5318.0717.5544.3137.1228.0425.0246.24
13Clitoria ternatea151,67334.5532.8432.6017.2417.3143.7436.6726.8423.8146.51
14Thermopsis lanceolata151,52636.4431.7831.7818.0618.3745.0637.9428.3325.3146.14
15Trifolium repens132,42934.2831.8133.9017.7916.4945.3237.9227.8024.8246.00
16Lotus corniculatus150,70036.0332.0631.9118.2917.7544.4437.0528.9326.0046.92
17Onobrychis viciifolia122,10234.5832.8632.5617.8816.7145.2337.3727.6624.6345.07
18Astragalus laxmannii122,84434.1133.0132.8816.5217.6044.8137.3027.1524.1745.07
19Stylosanthes guianensis156,76336.4331.8231.7518.1018.3344.9537.8829.9327.0148.19
20Astragalus sinicus123,83034.1033.0232.8916.4917.6144.9337.1927.1724.1545.51
21Medicago x varia125,69833.8133.0833.1016.3617.4545.2637.3226.8123.8244.87
22Medicago sativa subsp. falcata125,40633.8833.0933.0316.4117.4745.2637.3126.8323.8544.98
23Medicago lupulina124,10733.8833.0533.0716.4317.4545.0237.1926.7923.7844.78
24Galega orientalis125,28034.1132.9232.9717.5816.5345.7538.0526.7323.6645.99
25Melilotus officinalis126,94633.8533.0933.0616.3217.5344.8337.0726.7623.8445.43
26Medicago polymorpha124,16334.0932.9432.9616.4917.6045.3237.2127.1024.1044.68
27Trifolium pratense134,37034.2733.8931.8416.5217.7545.8038.1227.4224.3846.15
28Glycine max152,22635.3732.3732.2517.4017.9844.1636.8627.4424.4345.98
29Thermopsis turkestanica153,53836.6031.7031.6918.1818.4245.1937.8728.7425.7147.30
30Thermopsis alpina153,71436.4631.7731.7818.0918.3644.9437.5828.8925.9247.10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, R.; Xue, Y.; He, X.; Zhang, T. Comprehensive Analysis of Chloroplast Genomes in Leguminous Forage Species: Codon Usage, Phylogenetic Relationships, and Evolutionary Insights. Agronomy 2025, 15, 765. https://doi.org/10.3390/agronomy15040765

AMA Style

Yang R, Xue Y, He X, Zhang T. Comprehensive Analysis of Chloroplast Genomes in Leguminous Forage Species: Codon Usage, Phylogenetic Relationships, and Evolutionary Insights. Agronomy. 2025; 15(4):765. https://doi.org/10.3390/agronomy15040765

Chicago/Turabian Style

Yang, Rui, Ying Xue, Xiaofan He, and Tiejun Zhang. 2025. "Comprehensive Analysis of Chloroplast Genomes in Leguminous Forage Species: Codon Usage, Phylogenetic Relationships, and Evolutionary Insights" Agronomy 15, no. 4: 765. https://doi.org/10.3390/agronomy15040765

APA Style

Yang, R., Xue, Y., He, X., & Zhang, T. (2025). Comprehensive Analysis of Chloroplast Genomes in Leguminous Forage Species: Codon Usage, Phylogenetic Relationships, and Evolutionary Insights. Agronomy, 15(4), 765. https://doi.org/10.3390/agronomy15040765

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop