Next Article in Journal
Phenological Stages of the Species Jacaranda mimosifolia D. Don. According to the Extended BBCH Scale
Previous Article in Journal
A Comparative Machine Learning Study Identifies Light Gradient Boosting Machine (LightGBM) as the Optimal Model for Unveiling the Environmental Drivers of Yellowfin Tuna (Thunnus albacares) Distribution Using SHapley Additive exPlanations (SHAP) Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Evolutionary Dynamics of Chloroplast Genome and Codon Usage in the Genus Diospyros (Ebenaceae)

Liaoning Key Laboratory of Development and Utilization for Natural Products Active Molecules, Anshan Normal University, Anshan 114000, China
*
Author to whom correspondence should be addressed.
Biology 2025, 14(11), 1568; https://doi.org/10.3390/biology14111568 (registering DOI)
Submission received: 16 October 2025 / Revised: 3 November 2025 / Accepted: 7 November 2025 / Published: 9 November 2025

Simple Summary

Diospyros is a large and important genus of trees with ecological and economic value. This study analyzed chloroplast genomes and codon usage in 15 Diospyros species to understand their evolution and genetic diversity. The results showed low genetic variation in the IR regions and high conservation at boundary areas, with three main evolutionary groups identified. Codon usage analysis revealed a preference for A or U at the third position and weak codon bias overall. Natural selection, rather than mutation pressure, was found to be the main factor shaping codon usage patterns. These findings provide a robust foundation for future investigations into molecular evolution and phylogenetic relationships in the genus Diospyros.

Abstract

Diospyros, the most species-rich woody plant genus in Ebenaceae, has attracted significant academic interest due to its ecological and economic importance. This study presented the first complete assembly and annotation of the chloroplast genome of Diospyros tsangii. The chloroplast genome measured 157,445 bp, with a typical quadripartite circular structure and 132 annotated coding genes. A comprehensive analysis of evolutionary traits and codon usage preferences across chloroplast genomes of 15 Diospyros species were conducted. The main objective was to provide a theoretical basis for understanding phylogenetic relationships and assessing genetic diversity within Diospyros. Our findings showed that genetic diversity in the IR regions of the chloroplast genomes is notably lower than that in the LSC and SSC regions. The boundary regions exhibited high conservation with minimal variation. Selected pressure analysis indicated that most coding genes are under purifying selection. Phylogenetic analysis showed that D. tangii was sister to Diospyros oleifera, and Diospyros kaki was closely related to Diospyros vaccinioides with high supporting values. The examination of codon usage patterns showed that the GC content at the first, second, and third codon positions of 52 protein-coding sequences followed the order GC1 > GC2 > GC3, with a preference for A or U bases at the third position. The effective number of codons ranged from 45.13 to 45.43, which indicated the weak codon bias. The neutral-plot, ENC-plot, and PR2-plot analysis suggested that natural selection predominantly influences the codon usage patterns in Diospyros plants. These results would be vital to understand the evolutionary dynamics of the genus Diospyros.

1. Introduction

The genus Diospyros L. is the largest genus of Ebenaceae and comprises about 485 species of evergreen or deciduous wood plants. This genus widely distributes from tropical to temperate regions all over the world [1]. The highest species diversity is found in the Asia–Pacific region, which hosts around 300 species. Among the 60 documented species of Diospyros in China, 45 species are endemic, and 18 species are stenochoric, with a diversity center in southeast and southwest China [2]. As is well known, Diospyros is the most economically important genus of the Ebenaceae [3]. Several of these are widely recognized for their valuable timber or edible fruits, while numerous others play a crucial role as sources of medicinal compounds [4,5,6,7,8]. Such as Diospyros kaki, which is a globally significant economic fruit tree, known for its nutritional content including tannins, sugars, vitamin C, and carotenoids. Generally, closely related wild species are vital to qualitatively improve the breeding of the persimmon crops. Therefore, it is necessary to accumulate genetic information for exploring the genetic diversity of Diospyros.
The chloroplast genome has been proven to be informative and valuable for plant phylogenetic studies [9,10,11]. Typically, it consists of a pair of inverted repeat (IR) regions, a long single-copy (LSC) region, and a short single-copy (SSC) region, containing around 110–130 genes, mainly encoding essential proteins for photosynthesis and energy metabolism [12,13]. Comparative genomic analysis of chloroplast genomes identifies highly variable regions suitable for developing specific molecular markers such as SSRs and DNA barcodes [14,15,16,17]. For instance, Li et al. (2023) effectively resolved the phylogenetic relationships of Eriocaulon (Eriocaulaceae) using the complete plastome, and indicated the genus diverged since the late Miocene and diversified in the Quaternary [18]. Similarly, Wang et al. (2023) clarified species relationships within Lagerstroemia (Lythraceae) using plastome data and explored a rapid radiation since the late Miocene [19]. Jiang et al. (2023) reconstructed phylogenetic relationships within Zingiber (Zingiberaceae) with plastome data and identified 19 genes under positive selection [20]. Furthermore, Huang et al. (2024) provided robust support for classifying of Engelhardia species at the sectional level and identified three subfamilies within Juglandaceae, demonstrating the effectiveness of plastome sequences for achieving high phylogenetic resolution [21]. Lastly, Qu et al. (2025) conducted a comprehensive study of chloroplast genome characteristics across diverse cassava genetic resources, significantly enhancing understanding of chloroplast genome evolution in this vital crop species [22]. In general, the chloroplast genomes can effectively resolve phylogenetic relationships at different taxonomical levels.
Additionally, codon usage bias (CUB) refers to the non-random selection of synonymous codons that encode the same amino acid during protein-coding processes, reflecting the adaptability of genetic transcription and translation [23]. CUB is characterized by codon base composition, usage frequency, and evolutionary forces such as natural selection and mutation pressure [24,25]. The evolutionary conservation, lack of recombination, and maternal inheritance of the chloroplast genome make CUB analysis valuable for assessing genetic diversity [26,27,28,29]. For instance, Yang et al. (2024) revealed a consistent preference for codons ending in A or T across 18 Taraxacum species, and this observed CUB was mainly influenced by natural selection [30]. While Guo et al. (2025) examined CUB in the chloroplast genomes of ten Androsace species and demonstrated that the codons preferred encoding with A or U bases were closer related to mutational pressure than natural selection [31].
In this study, a thorough investigation into the evolution of chloroplast genomes and codon usage bias is conducted to address the genetic diversity within this genus in Diospyros. The primary objectives of this study are: (1) comprehensively detecting the chloroplast genomes evolution of Diospyros, and (2) exploring codon usage bias among Diospyros species and further identifying optimal codons. These findings would be key to the Diospyros germplasm resources and enhance our understanding of genome evolution in economical and horticultural plants.

2. Materials and Methods

2.1. Sampling Collection and Sequencing Procedure

The plant materials of Diospyros tsangii were obtained from Jinggangshan in Jiangxi Province, China (26.5818° N, 114.1386° E). The leaves were preserved in silica gel. Total genomic DNA was isolated using a modified CTAB protocol [32]. Following extraction, an Illumina paired-end (PE) library was prepared and sequenced at Personalbio Biotechnology Co., Ltd., located in Shanghai, China. The other Diospyros species were downloaded from GenBank (Table 1).

2.2. Chloroplast Genome Assembly and Annotation

For chloroplast genome assembly, approximately 6 Gb of raw paired-end reads (150 bp in length) were generated. The Trimmomatic v0.39 software was used to process the raw data by removing adapter sequences and trimming low-quality regions (LEADING:3, TRAINING:3, SLIDING WINDOW:4:15, MINLEN:36), resulting in high-quality clean reads [33]. These filtered reads were then assembled into a complete chloroplast genome using GetOrganelle v1.5 [34]. Annotation of the D. tsangii chloroplast genome was performed utilizing GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html (accessed on 2 December 2024)) and Geneious v9.1.4 (http://www.geneious.com/ (accessed on 2 December 2024)), with D. oleifera (NC_030787) serving as the reference genome [35]. The fully annotated chloroplast genome sequence of D. tsangii has been deposited in GenBank under the accession number PX413321.

2.3. Comparative Analysis of Chloroplast Genomes in the Genus Diospyros

2.3.1. Analysis of Simple Sequence Repeat in Chloroplast Genomes

The simple sequence repeat (SSR) characteristics of chloroplast genomes from 15 economically valuable species in the genus Diospyros were examined using the MISA tool (https://webblast.ipk-gatersleben.de/misa/ (accessed on 20 February 2025)) [36]. The analysis was conducted with the following minimum repetition criteria: at least 10 repeats for mononucleotides, 6 for dinucleotides, and a minimum of 5 repeats for tri-, tetra-, penta-, and hexanucleotides motifs.

2.3.2. Comparative Assessment of Chloroplast Genome Junction Regions

The IRscope platform (https://irscope.shinyapps.io/irapp/ (accessed on 25 February 2025)) was employed to conduct a comparative examination of the junctions between the large single-copy (LSC), small single-copy (SSC), and inverted repeat (IR) regions, particularly emphasizing the visualization of IR boundary contractions [37]. Furthermore, boundary-associated genes were delineated, and variations in gene types and sizes were scrutinized to evaluate the structural integrity of chloroplast genomes.

2.3.3. Evaluation of Nucleic Acid Polymorphism and Selective Pressure

Nucleotide polymorphism (Pi) was calculated from aligned sequences using DnaSP v6.0 software [38]. A sliding window approach was applied with a window size of 600 bp and a step size of 200 bp, based on the plant chloroplast genome model. Synonymous (dS) and non-synonymous (dN) substitutions, as well as the dN/dS ratio, were analyzed by aligning the protein-coding sequences of D. tsangii with those of 14 other Diospyros species. D. tsangii was used as the reference in pairwise gene alignments. The 80 common protein-coding genes were extracted using Geneious Prime 2021, and dN and dS values were computed using DnaSP v6.0. To assess selection pressures acting on chloroplast genes with distinct functional roles, CDS genes were classified into photosynthesis-related, self-replication-related, and other functional categories (Table S1). Box plots of dN/dS values for CDS genes were generated according to functional classifications and taxonomic groups, with significant intergroup differences indicated. All analyses were performed in R version 3.4.4.

2.3.4. Phylogenetic Tree Reconstruction and Time Estimation

Multiple sequence alignments were conducted using MAFFT version 7 [39], with subsequent removal of poorly aligned regions utilizing Gblock version 0.91b [40]. Phylogenetic reconstruction was carried out on complete chloroplast genome sequences using both Bayesian inference (BI) and maximum parsimony (MP) methods. For BI analysis, MrBayes v3.2.6 was employed under the GTR + G substitution model [41]. MP analysis was carried out in PAUP version 4b10 using heuristic search strategies, and nodal support was evaluated through 1000 bootstrap replicates [42].
Divergence times were estimated using BEAST v2.7.7 under a relaxed exponential clock model [43]. The root age was set as 28.01 Ma (S.D. = 2.2) under normal prior [44]. The speciation prior was set as YULE, and the substitution model of DNA regions was set as the GTR +I +G model. Markov Chain Monte Carlo (MCMC) searches were run for 100 million generations and sampled every 5000 generations. Convergence was assessed by Tracer v.1.7 to ensure the effective sampling size (ESS) for all parameters >200 [45]. The maximum clade credibility (MCC) tree was calculated by TreeAnnotator v.2.6.0 [46]. According to Linan et al., D. ferrea from Clade B was set as outgroup, and others sampled Diospyros species belonged to Clade A [44].

2.4. Analysis of Codon Usage Bias Pattern in Chloroplast Genomes in Diospyros

2.4.1. Calculation of Parameters Related to Codon Usage Bias

Biases in calculating preference indices can arise from issues like insufficient sample size and incomplete codon coverage in short genes [47,48]. These biases complicate the differentiation between selection pressure effects and random factors, undermining research credibility. To enhance statistical reliability and biological interpretation accuracy, this study excluded genes shorter than 300 bp. Additionally, gene sequences with non-ATG start codons, abnormal stop codons, and internal stop codons were omitted. A total of 52 protein-coding genes were scrutinized for codon bias across the chloroplast genomes of 15 Diospyros species in this investigation.
The software CodonW v1.4.2 was employed to compute the Relative Synonymous Codon Usage (RSCU) and Effective Number of Codons (ENC) [49,50]. RSCU denotes the relative frequency of codon usage for encoding specific amino acids, with values above 1 indicating high preference, values at 1 indicating no preference, and values below 1 indicating weak preference. The data analysis was conducted using IBM SPSS 29.0. Additionally, the overall GC content (GC_all) of each gene’s coding sequence, as well as the GC content at the first (GC1), second (GC2), and third (GC3) nucleotide positions within codons, were determined utilizing the CUSP tool (https://bioinformatics.nl/cgi-bin/emboss/cusp (accessed on 25 February 2025)).

2.4.2. Analysis of the Causes of Codon Usage Bias

To systematically identify the primary forces shaping CUB in Diospyros, several quantitative approaches—including neutral-plots, ENC-plots, and PR2-plots—were applied. Initially, the GC contents at the first and second positions (GC1 and GC2) in the coding sequences were computed to derive the average GC12 value. Subsequently, a neutral plot was constructed with GC3 on the horizontal axis and GC12 on the vertical axis to investigate the predominant influence of mutation or selection pressure on CUB. A significant correlation indicates mutation as the primary factor, while an insignificant correlation suggested a greater contribution of selection pressure in shaping codon usage patterns [51,52]. For the ENC-plot analysis, observed ENC values (ENCobs) were plotted against GC3s (GC content at synonymous third sites), with expected ENC values (ENCexp) calculated using the formula: 2 + GC3s + 29/[GC3s2 + (1 − GC3s)2]. The ggplot2 package in R was utilized for visualizing the ENC-plot, where discrepancies between ENCobs and ENCexp values reveals the main driver of codon preference [53]. The standard curve represents codon preference solely influenced by mutation in the absence of selective pressure [54]. The PR2-plot method was applied to evaluate the combined effects of mutational pressure and selection on codon usage patterns. This graphical representation utilizes A3/(A3 + T3) on the y-axis and G3/(G3 + C3) on the x-axis. The position and orientation of each gene on the plot indicates its CUB, with the central point (A = T, C = G) corresponding to balanced codon usage, indicating no bias [55,56].

2.4.3. Determination of Optimal Codons

In the assessment of optimal codons, ENC serves as the benchmark for evaluating codon usage bias. Genes are sorted on their ENC values, with the top 10% and bottom 10% deciles selected to form high- and low-expression gene pools, respectively. Subsequently, the RSCU and the ∆RSCU (calculated as RSCUhigh − RSCUlow) for codons in these gene pools are computed. Codons exhibiting both an RSCU greater than 1 and a ∆RSCU of at least 0.08 were classified as optimal codons [49].

3. Results

3.1. Chloroplast Genome Characters of Diospyros tsangii

The complete chloroplast genome of D. tsangii, the newly sequenced species in this study, exhibits a typical quadripartite structure with a length of 157,445 bp, comprising a SSC of 18,523 bp, a LSC of 86,744 bp, and a pair of IRs of 26,089 bp (Figure S1). Annotation revealed a total of 132 genes, including 87 protein-coding genes, 37 transfer RNA (tRNAs) genes, and eight ribosomal RNA (rRNAs) genes (Table S1). Of these genes, 74 were associated with self-replication, encompassing 11 genes linked to the large ribosomal subunit and 14 to the small ribosomal subunit. Furthermore, 45 genes were implicated in photosynthesis, with 6 genes related to ATP synthase, 12 to NADH dehydrogenase, 6 to the cytochrome b/f complex, 5 to the PS I system, 15 to the PS II system, and 1 associated with Rubisco. Additionally, 13 genes were annotated with either other functions (infA, clpP, ccsA, accD, cemA, and matK) or unknown functions (ycf1, ycf2, ycf3, ycf4, and ycf15). Among the genes, 14 were observed to contain a single intron (atpF, ndhA, ndhB, petB, petD, rpl2, rpl16, rpoC1, trnAUGC, trnGUCC, trnIGAU, trnKUUU, trnLUAA and trnVUAC), while 3 genes (rps12, clpP and ycf3) harbored two introns (Table S1).

3.2. Comparative Analysis of Chloroplast Genomes in Diospyros

3.2.1. The Size and Structure of the Chloroplast Genome

Based on the newly obtained D. tsangii chloroplast genome, we further conducted comparative evolutionary analysis with other fourteen Diospyros species (Table 1). Chloroplast genome sizes varied from 157,368 bp (D. rhombifolia) to 157,999 bp (D. hainanensis), with a consistent GC content of 37.4% across all species. Each chloroplast genome comprised a LSC, a SSC, and two inverted repeat regions (IRa and IRb). The LSC region spanned from 86,774 bp (D. tsangii) to 87,523 bp (D. hainanensis), accounting for 55.09% to 55.43% of the total genome length. The SSC region ranged from 18,322 bp (D. hainanensis) to 18,536 bp (D. kaki), representing 11.60% to 11.76% of the genome. The IR regions varied from 25,874 bp (D. strigosa) to 26,180 bp (D. dumetorum), comprising 32.88% to 33.17% of the total length. The 15 Diospyros species had 132 coding genes, including 87 CDSs, 8 rRNAs, and 37 tRNAs.
In total, 48–77 SSR loci were identified within the chloroplast genomes of 15 Diospyros plants species (Figure 1A, Table S2). These loci comprised 29–66 single nucleotide repeats, predominantly A/T base repeats, as well as 2–5 dinucleotide repeats, 1–4 trinucleotide repeats, and 5–10 tetranucleotide repeats (Figure 1B, Table S2). Furthermore, pentanucleotide repeats were observed in eight species, namely D. maclurei, D. hainanensis, D. strigosa, D. eriantha, D. dumetorum, D. rhombifolia, D. cathayensis, and D. sutchuensis. Examination of the repeat sequences unveiled the presence of 15–27 forward repeats, 1–6 inverted repeats, and 21–34 palindrome repeats. Additionally, a complementary duplication was identified in D. maclurei and D. dumetorum (Figure 1C, Table S3). The results indicated a positive association between the length of the IR and the overall chloroplast genome size, while no significant correlation was found between the number of SSRs and the total chloroplast genome length (Figure 1D,E).

3.2.2. Boundary Analysis of IR in the Genus Diospyros

The collinearity analysis conducted on 15 Diospyros species did not reveal any gene rearrangements or inversions (Figure S2). The gene composition near the boundaries of LSC/IRb (JLB), IRb/SSC (JSB), SSC/IRa (JSA), and IRa/LSC (JLA) remained consistent among species, although there were slight variations in the distances between these genes and their respective boundaries (Figure 2). Mapping of the IR boundary revealed that the rps19 gene in Diospyros species spanned from LSC to IRb, with a 7–8 bp extension into the IRb region. Additionally, the ndhF gene was located adjacent to the junction between the SSC and IRb (JSB). A 4-bp extension across this boundary was observed in D. rhombifolia and D. cathayensis, while the remaining species did not exhibit such an extension. Furthermore, the ycf1 gene was found adjacent to the junction between the LSC and IRa (JSA).

3.2.3. Chloroplast Genome Sequence Variation and Selected Pressure Assessment

Nucleotide diversity in chloroplast genomes of 15 Diospyros species was evaluated using DnaSP software, demonstrating an average Pi range from 0 to 0.082 (Figure 3). Notably, regions with high variability were predominantly situated in the LSC and SSC regions, whereas the IR regions displayed greater conservation. Specifically, within the LSC region, the trnT-trnL (0.02486), petA-psbJ (0.02557), and psbE-petL (0.03183) spacer regions exhibited the highest Pi values. In the SSC region, the ycf1 (0.082), rpl32-trnL (0.08065), and ndhA (0.08014) three regions showed the greatest diversity.
To assess the evolutionary pressures on protein-coding homologous genes across 15 Diospyros species, we calculated the dN/dS ratios for 80 coding sequences genes (Figure S3, Table S4). The analysis revealed that 79 of these genes had dN/dS values below 1, indicating that they are predominantly under purifying selection. Interestingly, the photosynthesis gene psbI had a dN/dS value slightly above 1, suggesting potential positive selection. Furthermore, no significant differences were observed in the dN/dS values among self-replication-related, photosynthesis-related, and other genes across the 15 Diospyros species.

3.2.4. Phylogenetic Analysis

In this study, the BI and MP phylogenetic trees based on complete chloroplast genome sequences of Diospyros species were depicted in Figure 4. The phylogenetic relationships among the Diospyros have been fully resolved. The D. sutchuensis, D. cathayensis, and D. rhombifolia formed a clade with highest supporting values (BI-PP = 1.00, MP-BS = 100%). Subsequently, D. dumetorum, D. eriantha, and D. strigosa clustered together (BI-PP = 1.00, MP-BS = 100%). Then the two clades clustered with D. hainanensis with well supporting values. Additionally, D. glaucifolia, D. lotus, and D. morrisiana formed a clade with highest supporting values (BI-PP = 1.00, MP-BS = 100%). Finally, the four species (D. tangii, D. oleifera, D. kaki, and D. vaccinioides) formed a clade with well supporting values (BI-PP = 1.00, MP-BS = 100%). Especially, D. kaki was closely related to D. vaccinioides.
Our results suggested that D. tangii diverged from its sister species D. oleifera at 1.62 Ma (95% HPD: 0.18–5.00), and D. kaki diverged from D. vaccinioides at 0.44 Ma (95% HPD: 0.03–2.57) (Figure 4).

3.3. Codon Usage Bias in the Chloroplast Genomes of Diospyros

3.3.1. Codon Composition Characteristics and Preferences of Chloroplast Genomes

Fifty-two conserved genes were identified in CUB analysis of 15 Diospyros species (Table S5). The GC contents of the three codon positions, GC1, GC2, and GC3, ranged from 46.95% to 47.05%, 39.44% to 39.58%, and 27.70% to 27.91%, respectively. The average overall GC content (GC_all) fell within the range of 38.05% to 38.18%. The order of GC content observed was GC1 > GC2 > GC3, indicating a non-uniform distribution of GC content among codon positions. All GC values were below 50%, suggesting a preference for A/U bases in codons, particularly at the third position. The ENC for the chloroplast genomes of the 15 Diospyros species ranged from 45.13 to 45.43 (Table 2), implying weak codon usage bias. The ycf3 gene displayed the highest ENC value, while the rps18 gene exhibited the lowest ENC value, except the rps16 gene with the lowest ENC value in D. kaki and D. vaccinioides (Table 2).
Analysis of codon composition parameters across all species revealed consistent patterns (Figure 5). GC_all exhibited a significant positive correlation with GC1, GC2, and GC3, with GC1 strongly correlated with GC2. Conversely, correlations between GC1 and GC3, and between GC2 and GC3, were not significant. These findings suggested that base composition at the first and second positions is similar, while the third position differs significantly in GC content. ENC displayed a correlation with GC3 across all species. Significant correlations between ENC and GC1 were observed only in D. rhombifolia and D. cathayensis, while a significant correlation between ENC and GC2 was noted solely in D. maulcurei. No significant correlations were found between ENC and GC_all or between ENC and the number of codons. These results indicate that base composition influences codon preference, particularly at the third position, while the number of codons does not significantly affect ENC.
In the chloroplast genomes of 15 Diospyros species, RSCU values of 30 codons exceeded 1 (Figure S4). Notably, 29 of these codons predominantly ended with A or U. Noteworthy preferences included leucine favoring the UUA codon and alanine showing a preference for GCU. Collectively, these chloroplast genomes demonstrated a high degree of uniformity in codon usage, particularly emphasizing a predilection for A or U in the third codon position.

3.3.2. The Causes of Codon Usage Bias

Neutral-plot demonstrated that chloroplast genes from 15 Diospyros species were consistently positioned above the diagonal line, with most genes notably deviating from it (Figure S5). The regression coefficients for each species were uniformly modest, ranging from 0.0502 to 0.1326. Notably, the lowest coefficient was observed in D. sutchunensis, while the highest was in D. vaccinioides, suggesting that mutation pressure explains only 5.02% to 13.26% of the variance, with natural selection accounting for 86.74% to 94.98%.
The analysis of ENC-plot revealed that the GC3s values of individual genes predominantly fall within the range of 0.2 to 0.4, while the corresponding ENC values range from 30 to 55 (Figure S6). With the exception of the ycf3 gene positioned above the anticipated standard curve, and the rps2 and ndhH genes, which closely followed the curve, the majority of genes exhibited a notable deviation below the standard curve. Notably, a consistent pattern in the frequency distributions across species was illustrated in Figure S7. Approximately 20% of genes fall within the range of −0.05 to 0.05, while the remaining extended beyond this interval and predominantly clustered between 0.05 and 0.15. Collectively, the ENC analysis of Diospyros chloroplast genes indicated a weak association between the codon usage biases and GC3s variations, with the codon preferences of the majority of genes primarily shaped by natural selection.
PR2-plot was performed on the A/T and C/G bases at the third position of codons within the chloroplast genomes of 15 Diospyros plants species (Figure S8). The analysis revealed an uneven distribution of scatter points across various regions, with a predominant concentration in the lower right quadrant. This distribution suggested an imbalance in the utilization of the four bases at the third codon position in the chloroplast genomes of these Diospyros plants, primarily influenced by natural selection. These findings aligned with those obtained from neutral-plot and ENC-plot analyses.

3.3.3. Optimal Codons of Chloroplast Genome in Diospyros

Optimal codon analysis was performed utilizing ENC and RSCU values from the chloroplast genomes of various Diospyros species (Figure 6). The range of optimal codons varied from 14 to 21, with D. oleifera, D. tsangii, and D. lotus displaying the highest number at 21. Conversely, D. hainanensis, D. rhombifolia, D. cathayensis, and D. sutchuensis exhibited the lowest and identical count of optimal codons. Nine optimal codons were consistently identified across all 15 species, including UGU (cysteine), CAA (glutamine), GAA (glutamic acid), GGU (glycine), CUU and UUA (leucine), AAA (lysine), and GUA and GUU (valine). Notably, only one codon, UUG (leucine), terminated with G, observed solely in D. vaccinioides, while the remaining codons concluded with A or U. These findings suggest a pronounced preference for codons ending in A/U in the codon utilization patterns of Diospyros chloroplast genomes.

4. Discussion

4.1. Chloroplast Genome Evolution Within Diospyros

Chloroplast genomes are commonly utilized in studies focusing on genomic evolution, nucleotide substitution patterns, and phylogenetic relationships in plants [57]. This research newly sequenced and annotated the chloroplast genome of D. tsangii. The complete chloroplast genome was 157,445 bp in length with 37% GC content (Figure S1), exhibiting a typical quadripartite structure. This species contained 132 coding genes, including 87 CDSs, 8 rRNAs, and 37 tRNAs (Table 1), which is similar to other Diospyros species [58,59,60]. The quadripartite boundaries of the chloroplast genomes of 15 Diospyros species displayed relative conservation, with the LSC/IRb and SSC/IRa boundaries located within the rps19 and ycf1 genes, respectively (Figure 2). Analysis of SSRs revealed 48 to 77 loci in the chloroplast genomes of 15 Diospyros species (Figure 1A, Table S2), predominantly comprising mononucleotide repeats, notably A/T bases (Figure 1B, Table S2). Furthermore, four types of long-repeat sequences were identified in D. dumetorum and D. maclurei, while three types were detected in other species, primarily forward repeats and palindromic repeats, potentially involved in chloroplast genome replication and repair mechanisms (Figure 1C, Table S3). Correlation analysis indicated a significant positive relationship between the length of IR region and the total chloroplast genome length, while no significant correlation was found between the number of SSRs and the total genome length (Figure 1D,E). These findings indicated that the chloroplast genomes of Diospyros are generally structurally conserved yet exhibit the genetic diversity within the genus.
Additionally, six highly variable regions were identified, including four intergenic spacers (trnT-trnL, petA-psbJ, psbE-petL, rpl32-trnL) and two gene regions (ycf1, ndhA) (Figure 3). In contrast, Li et al. (2018) reported eight highly variable regions (trnH-psbA, rps16-trnQ, rpoB-trnC, rps4-trnT-trnL, ndhF, ndhF-rpl32-trnL, ycf1a, and ycf1b), with some differences possibly attributed to varying sample selections [59]. Considering the overlapping genes identified in the two studies, trnT-trnL, rpl32-trnL, ycf1a, and ycf1b are strongly suggested as potential highly variable region candidates for Diospyros, which provided a valuable molecular basis for species identification in Diospyros. which may be used as species identification and phylogenetic analysis. Phylogenetic relationships among Diospyros have been well resolved (Figure 4), which has also confirmed the strong power of chloroplast genomes in phylogeny. Furthermore, the close phylogenetic relationship among D. tsangii, D. oleifera, D. vaccinioides, and the cultivated persimmon (D. kaki) suggested its potential as a promising wild germplasm resource for the breeding of new cultivars.

4.2. Natural Selection and the Codon Preference of the Diospyros Chloroplast Genome

Variations in codon usage frequencies among plant genes are a significant feature in plant evolution [49]. Analysis of codon usage bias is a valuable tool for understanding the evolutionary history of plants [23,61,62]. Mutation and natural selection minimally impact the third-position bases of codons, leading to the utilization of the GC3 content for codon analysis [63]. In our study, we observed a descending gradient distribution of GC1 > GC2 > GC_all > GC3 in the chloroplast genomes of 15 Diospyros species, with GC3 values ranging from 27.70% to 27.91% (Table 1), indicating a notable preference for A/U bases at the third position of codons. This preference was similar to the findings from chloroplast genome analyses of Camellia (GC3 = 28.59–28.64%) [64], Rosaceae (GC3 = 28.27–28.61%) [65], and Juglandaceae (GC3 = 28.2–29.26%) [66], supporting the notion that angiosperm plants tend to favor codons ending with A/U [67,68]. Furthermore, ENC values for the chloroplast genomes of the 15 Diospyros species ranged from 45.13 to 45.43 (Table 2). Apart from rps18, rps14, and rpl16, most genes in the 15 species exhibited ENC values above 35, indicating weak codon usage bias and similar bias patterns. This observation aligned with Sharp et al.’s (1988) proposition that species with close genetic relationships exhibit highly similar codon usage biases [69].
Natural selection and mutation are the primary determinants of plant codon preference. Optimal codons are favored by plants due to natural selection, while non-preferred codons can arise from mutations [62,66,70,71]. This study investigated the factors influencing codon usage bias in chloroplast genomes of Diospyros plants using ENC-plot, PR2-plot, and neutral-plot analyses. Neutral-plot analysis revealed a weak correlation between GC12 and GC3. Regression analysis indicated that mutational pressure contributed between 5.02% and 13.26% to codon evolution in 15 Diospyros species (Figure S5), highlighting the predominant role of natural selection in chloroplast genome codon evolution within this genus. This finding was corroborated by ENC-plot and PR2-plot analyses. ENC-plot analysis showed that most genes deviated below the expected curve (Figure S6), while PR2-plot analysis demonstrated higher usage frequencies of G and C compared to A and T (Figure S8). Collectively, these results indicated that natural selection primarily governs chloroplast codon usage in 15 Diospyros species. Moreover, the divergent time estimated showed that the target species D. tangii and its sister species D. oleifera diverged during the middle Pleistocene (1.62 Ma, 95% HPD: 0.18–5), and the most economical value species D. kaki diverged from its closely related species during the late Pleistocene (0.44 Ma, 95% HPD: 0.03–2.57) (Figure 4), which also implied that these species were more susceptible to natural selection than mutation.
Additionally, synonymous and non-synonymous substitution patterns serve as important indicators in the study of gene evolution [72]. Under purifying selection, non-synonymous mutations tend to be eliminated more efficiently than synonymous ones, leading to a lower substitution rate for non-synonymous sites. Consequently, the dN/dS ratio typically remains below 1 in most instances [73]. To better understand the adaptive evolutionary dynamics of plastomes within the Diospyros family, we computed the dN/dS values for protein-coding genes. Our analysis revealed that only the psaI gene exhibited a dN/dS ratio exceeding 1, while the remaining 79 genes showed ratios less than 1, reflecting widespread and strong purifying selection across the plastid genome (Figure S3, Table S4). These findings aligned with previous research, such as Engelhardia [21], Camellia [64], and Theaceae [74]. For instance, the neutrality plot analysis of 13 Camellia species conducted by Chen et al. (2023), which reported a mutational contribution rate ranging from 7.68% to 10.02% based on the slope of GC12 to GC3 in this genus [64]. Moreover, in the chloroplast genomes of Theaceae, the slope of GC12 to GC3 is −11.12% to 14.84%, highlighting that natural selection is the primary influence of codon usage bias [74]. Within this context, Diospyros plants have developed a codon usage pattern that combines universal elements with taxon-specific features. These results not only lay the groundwork for codon optimization in the molecular breeding of Diospyros plants but also shed light on the adaptive strategies adopted by chloroplast genomes in woody plants over extended periods of evolution.

5. Conclusions

The genus Diospyros, recognized as an economically important group of woody plants, has attracted considerable scientific interest. In this study, a detailed comparative analysis was carried out on the chloroplast genomes of 15 species within the genus, uncovering substantial conservation in GC content, gene content, and overall genomic architecture. A total of 48–77 SSR loci were identified, highlighting their potential application as molecular markers in genetic studies. Codon usage analysis revealed a slight preference for A/U-ending codons in Diospyros, a pattern largely attributed to natural selection rather than mutational bias. These results provide critical genomic resources that can support future investigations into molecular evolution and phylogenetic relationships in the genus Diospyros.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/biology14111568/s1, Table S1: List of gene contents in the plastome of Diospyros tsangii; Table S2: Types and numbers of SSRs within the chloroplast genomes of 15 Diospyros species; Table S3: Types and numbers of the repeat sequences in 15 Diospyros species; Table S4: The dN/dS values of CDS genes within 15 species of the Diospyros genus; Table S5: Genes used for codon usage bias analysis within the chloroplast genomes of 15 Diospyros species; Figure S1: Chloroplast genome map of D. tsangii; Figure S2: Colinearity analysis of the chloroplast genomes of 15 Diospyros species; Figure S3: Evolutionary pressure assessment of plastid gene orthologs across 15 Diospyros species; Figure S4: The RSCU of amino acids in 15 species chloroplast genomes within the genus Diospyros; Figure S5: Neutral-plot analysis in 15 species chloroplast genomes within the genus Diospyros; Figure S6: ENC-plot analysis in 15 species chloroplast genomes within the genus Diospyros; Figure S7: Distribution of ENC ratio in 15 species chloroplast genomes within the genus Diospyros; Figure S8: PR2-plot analysis in 15 species chloroplast genomes within the genus Diospyros.

Author Contributions

J.Z.: conceptualization, investigation, data curation, funding acquisition, supervision, writing—draft, writing—review and editing. Z.L.: conceptualization, formal analysis, data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Funding of Liaoning Key Laboratory of Development [LZ202301].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The chloroplast genome data of D. tsangii have been deposited in the GenBank international repository. The complete list of Accession Numbers corresponding to the samples analyzed in this study is provided within Table 1.

Conflicts of Interest

The authors declare that they have no known competing interests.

References

  1. Lee, S.K.; Gilberg, M.G.; White, F. Diospyros Linnaeus. In Flora of China; Wu, Z.-Y., Raven, P.H., Hong, D.-Y., Eds.; Science Press: Beijing, China, 1996; Volume 15, pp. 215–234. [Google Scholar]
  2. Tang, D.; Zhang, Q.; Xu, L.; Guo, D.; Luo, Z. Number of Species and Geographical Distribution of Diospyros L. (Ebenaceae) in China. Hortic. Plant J. 2019, 5, 59–69. [Google Scholar] [CrossRef]
  3. Duangjai, S.; Samuel, R.; Munzinger, J.; Forest, F.; Wallnöfer, B.; Barfuss, M.H.J.; Fischer, G.; Chase, M.W. A multi-locus plastid phylogenetic analysis of the pantropical genus Diospyros (Ebenaceae), with an emphasis on the radiation and biogeographic origins of the New Caledonian endemic species. Mol. Phylogenet. Evol. 2009, 52, 602–620. [Google Scholar] [CrossRef]
  4. Chen, X.-N.; Fan, J.-F.; Yue, X.; Wu, X.-R.; Li, L.-T. Radical scavenging activity and phenolic compounds in persimmon (Diospyros kaki L. cv. Mopan). J. Food Sci. 2008, 73, C24–C28. [Google Scholar] [CrossRef]
  5. Park, Y.-S.; Leontowicz, H.; Leontowicz, M.; Namiesnik, J.; Jesion, I.; Gorinstein, S. Nutraceutical value of persimmon (Diospyros kaki Thunb.) and its influence on some indices of atherosclerosis in an experiment on rats fed cholesterol-containing diet. Adv. Hortic. Sci. 2008, 22, 250–254. [Google Scholar]
  6. Jang, I.-C.; Jo, E.-K.; Bae, M.-S.; Lee, H.-J. Antioxidant and antigenotoxic activities of different parts of persimmon (Diospyros kaki cv. Fuyu) fruit. J. Med. Plants Res. 2010, 4, 155–160. [Google Scholar]
  7. Yi, Z.; Qiao, J.-J.; Lu, G.-Y.; Wu, G.; Xie, G.-Y.; Qin, M.-J. Identification of six species of medicinal Diospyros plants based on leaf macro-and micro-morphology. Chin. J. Chin. Mater. Med. 2016, 41, 3942–3949, (In Chinese with English Abstract). [Google Scholar]
  8. Han, W.-J.; Zhang, Q.; Pu, T.-T.; Wang, Y.-R.; Li, H.-W.; Luo, Y.; Li, T.-S.; Fu, J.-M. Diversity of fruit quality in astringent and non-astringent persimmon fruit germplasm. Horticulturae 2023, 9, 24. [Google Scholar] [CrossRef]
  9. Kaushal, C.; Abdin, M.Z.; Kumar, S. Chloroplast genome transformation of medicinal plant Artemisia annua. Plant Biotechnol. J. 2020, 18, 2155–2157. [Google Scholar] [CrossRef]
  10. Li, H.-T.; Luo, Y.; Gan, L.; Ma, P.-F.; Gao, L.-M.; Yang, J.-B.; Cai, J.; Gitzendanner, M.A.; Fritsch, P.W.; Zhang, T.; et al. Plastid phylogenomic insights into relationships of all flowering plant families. BMC Biol. 2021, 19, 232. [Google Scholar] [CrossRef]
  11. Chen, S.-L.; Yin, X.-M.; Han, J.-P.; Sun, W.; Yao, H.; Song, J.-Y.; Li, X.-W. DNA barcoding in herbal medicine: Retrospective and prospective. J. Pharm. Anal. 2023, 13, 431–441. [Google Scholar] [CrossRef]
  12. Zheng, X.-M.; Wang, J.-R.; Feng, L.; Liu, S.; Pang, H.-B.; Qi, L.; Li, J.; Qiao, W.-H.; Zhang, L.-F.; Cheng, Y.-L.; et al. Inferring the evolutionary mechanism of the chloroplast genome size by comparing whole-chloroplast genome sequences in seed plants. Sci. Rep. 2017, 7, 1555. [Google Scholar] [CrossRef]
  13. Dobrogojski, J.; Adamiec, M.; Luciński, R. The chloroplast genome: A review. Acta Physiol. Plant. 2020, 42, 98. [Google Scholar] [CrossRef]
  14. Knox, E.B.; Downie, S.R.; Palmer, J.D. Chloroplast genome rearrangements and the evolution of giant lobelias from herbaceous ancestors. Mol. Biol. Evol. 1993, 10, 414–430. [Google Scholar] [CrossRef]
  15. Cosner, M.E.; Raubeson, L.A.; Jansen, R.K. Chloroplast DNA rearrangements in Campanulaceae: Phylogenetic utility of highly rearranged genomes. BMC Evol. Biol. 2004, 4, 27. [Google Scholar] [CrossRef]
  16. Daniell, H.; Lin, C.-S.; Yu, M.; Chang, W.-J. Chloroplast genomes: Diversity, evolution, and applications in genetic engineering. Genome Biol. 2016, 17, 134. [Google Scholar] [CrossRef]
  17. Zhou, P.; Lei, W.-S.; Shi, Y.-K.; Liu, Y.-Z.; Luo, Y.; Li, J.-H.; Xiang, X.-G. Plastome evolution, phylogenomics, and DNA barcoding investigation of Gastrochilus (Aeridinae, Orchidaceae), with a focus on the systematic position of Haraella retrocalla. Int. J. Mol. Sci. 2024, 25, 8500. [Google Scholar] [CrossRef]
  18. Li, E.-Z.; Liu, K.-J.; Deng, R.-Y.; Gao, Y.-W.; Liu, X.-Y.; Dong, W.-P.; Zhang, Z.-X. Insights into the phylogeny and chloroplast genome evolution of Eriocaulon (Eriocaulaceae). BMC Plant Biol. 2023, 23, 32. [Google Scholar] [CrossRef]
  19. Wang, J.; He, W.-C.; Liao, X.-Z.; Ma, J.; Gao, W.; Wang, H.-Q.; Wu, D.-L.; Tembrock, R.; Wu, Z.-Q.; Gu, C.-H. Phylogeny, molecular evolution, and dating of divergences in Lagerstroemia using plastome sequences. Hortic. Plant J. 2023, 9, 345–355. [Google Scholar] [CrossRef]
  20. Jiang, D.-Z.; Cai, X.-D.; Gong, M.; Xia, M.-Q.; Xing, H.-H.; Dong, S.-S.; Tian, S.-M.; Li, J.-L.; Lin, J.-Y.; Liu, Y.-Q.; et al. Complete chloroplast genomes provide insights into evolution and phylogeny of Zingiber (Zingiberaceae). BMC Genom. 2023, 24, 30. [Google Scholar]
  21. Huang, Y.; Jin, X.-J.; Zhang, C.-Y.; Li, P.; Meng, H.-H.; Zhang, Y.-H. Plastome evolution of Engelhardia facilitates phylogeny of Juglandaceae. BMC Plant Biol. 2024, 24, 634. [Google Scholar] [CrossRef] [PubMed]
  22. Qu, J.-R.; Fu, H.-T.; Zhao, Y.; Dai, X.-H.; Lu, L.-Y.; Liu, Y.; Mo, G.-H.; Wen, F.; Li, J.; Bhanot, D.; et al. Comprehensive analysis of 385 chloroplast genomes unveils phylogenetic relationships and evolutionary history in cassava. BMC Plant Biol. 2025, 25, 858. [Google Scholar] [CrossRef]
  23. Iriarte, A.; Lamolle, G.; Musto, H. Codon usage bias: An endless tale. J. Mol. Evol. 2021, 89, 589–593. [Google Scholar] [CrossRef]
  24. Xu, C.; Cai, X.-N.; Chen, Q.-Z.; Zhou, H.-X.; Cai, Y.; Ben, A.-L. Factors affecting synonymous codon usage bias in chloroplast genome of Oncidium Gower Ramsey. Evol. Bioinform. 2011, 7, 271–278. [Google Scholar] [CrossRef]
  25. Li, N.; Li, Y.-Y.; Zheng, C.-C.; Huang, J.-G.; Zhang, S.-Z. Genome-wide comparative analysis of the codon usage patterns in plants. Genes Genom. 2016, 38, 723–731. [Google Scholar] [CrossRef]
  26. Duret, L. tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet. 2000, 16, 287–289. [Google Scholar] [CrossRef]
  27. Romero, H.; Zavala, A.; Musto, H. Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces. Nucl. Acid. Res. 2000, 28, 2084–2090. [Google Scholar] [CrossRef] [PubMed]
  28. Fages-Lartaud, M.; Hundvin, K.; Hohmann-Marriott, M.F. Mechanisms governing codon usage bias and the implications for protein expression in the chloroplast of Chlamydomonas reinhardtii. Plant J. 2022, 112, 919–945. [Google Scholar] [CrossRef]
  29. Li, X.-J.; Liu, L.-E.; Ren, Q.-D.; Zhang, T.; Hu, N.; Sun, J.; Zhou, W. Analysis of synonymous codon usage bias in the chloroplast genome of five Caragana. BMC Plant Biol. 2025, 25, 322. [Google Scholar] [CrossRef]
  30. Yang, Y.; Wang, X.-L.; Shi, Z.-J. Comparative study on codon usage patterns across chloroplast genomes of eighteen Taraxacum species. Horticulturae 2024, 10, 492. [Google Scholar] [CrossRef]
  31. Guo, Y.-P.; Shi, M.-W.; Ma, H.; Ma, Y.-J.; Yao, B.-Q. Analysis of codon bias in chloroplast genomes of ten plants in Androsace. Chin. Tradit. Herb. Drugs 2025, 56, 1355–1365, (In Chinese with English Abstract). [Google Scholar]
  32. Doyle, J.J.; Doyle, J.L. A rapid DNA isolation procedure for small amounts of fresh leaf tissue. Phytochem. Bull. 1987, 19, 11–15. [Google Scholar]
  33. Bolger, A.M.; Lohse, M.; Usadel, B. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 2014, 30, 2114–2120. [Google Scholar] [CrossRef]
  34. Jin, J.-J.; Yu, W.-B.; Yang, J.-B.; Song, Y.; de Pamphilis, C.W.; Yi, T.-S.; Li, D.-Z. GetOrganelle: A fast and versatile toolkit for accurate de novo assembly of organelle genomes. Genome Biol. 2020, 21, 241. [Google Scholar] [CrossRef]
  35. Tillich, M.; Lehwark, P.; Pellizzer, T.; Ulbricht-Jones, E.S.; Fischer, A.; Bock, R.; Greiner, S. GeSeq-versatile and accurate annotation of organelle genomes. Nucl. Acid. Res. 2017, 45, 6–11. [Google Scholar] [CrossRef] [PubMed]
  36. Beier, S.; Thiel, T.; Münch, T.; Scholz, U.; Mascher, M. MISA-web: A web server for microsatellite prediction. Bioinformatics 2017, 33, 2583–2585. [Google Scholar] [CrossRef]
  37. Amiryousefi, A.; Hyvönen, J.; Poczai, P. IRscope: An online program to visualize the junction sites of chloroplast genomes. Bioinformatics 2018, 34, 3030–3031. [Google Scholar] [CrossRef]
  38. Rozas, J.; Ferrer-Mata, A.; Sánchez-DelBarrio, J.C.; Guirao-Rico, S.; Librado, P.; Ramos-Onsins, S.E.; Sánchez-Gracia, A. DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol. Biol. Evol. 2017, 34, 3299–3302. [Google Scholar] [CrossRef]
  39. Katoh, K.; Standley, D.M. MAFFT multiple sequence alignment software version 7: Improvements in performance and usability. Mol. Biol. Evol. 2013, 26, 772–780. [Google Scholar] [CrossRef]
  40. Talavera, G.; Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. 2007, 56, 564–577. [Google Scholar] [CrossRef] [PubMed]
  41. Ronquist, F.; Huelsenbeck, J.P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 2003, 19, 1572–1574. [Google Scholar] [CrossRef]
  42. Swofford, D.L. PAUP*: Phylogenetic Analysis Using Parsimony (and Other Methods), version 4.0b10; Sinauer: Sunderland, MA, USA, 2003.
  43. Bouckaert, R.; Heled, J.; Kühnert, D.; Vaughan, T.; Wu, C.-H.; Xie, D.; Suchard, M.A.; Rambaut, A.; Drummond, A.J. BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 2014, 10, e1003537. [Google Scholar] [CrossRef]
  44. Linan, A.G.; Schatz, G.E.; Lowry, P.P.; Miller, A.; Edwards, C.E. Ebony and the Mascarenes: The evolutionary relationships and biogeography of Diospyros (Ebenaceae) in the western Indian Ocean. Bot. J. Linn. Soc. 2019, 190, 359–373. [Google Scholar] [CrossRef]
  45. Rambaut, A.; Drummond, A.J.; Xie, D.; Baele, G.; Suchard, M.A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 2018, 67, 901–904. [Google Scholar] [CrossRef]
  46. Drummond, A.J.; Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 2007, 7, 214. [Google Scholar] [CrossRef]
  47. Moriyama, E.N.; Powell, J.R. Gene length and codon usage bias in Drosophila melanogaster, Saccharomyces cerevisiae and Escherichia coli. Nucleic Acids Res. 1998, 26, 3188–3193. [Google Scholar] [CrossRef]
  48. Fuglsang, A. Impact of bias discrepancy and amino acid usage on estimates of the effective number of codons used in a gene. Gene 2008, 410, 82–88. [Google Scholar] [CrossRef]
  49. Lee, S.; Weon, S.; Lee, S.; Kang, C. Relative codon adaptation index, a sensitive measure of codon usage bias. Evol. Bioinform. 2010, 6, 47–55. [Google Scholar] [CrossRef]
  50. Wright, F. The ‘effective number of codons’ used in a gene. Gene 1990, 87, 23–29. [Google Scholar] [CrossRef] [PubMed]
  51. Vicario, S.; Moriyama, E.N.; Powell, J.R. Codon usage in twelve species of Drosophila. BMC Evol. Biol. 2007, 7, 226. [Google Scholar] [CrossRef]
  52. Tao, P.; Dai, L.; Luo, M.-C.; Tang, F.-Q.; Tien, P.; Pan, Z.-S. Analysis of synonymous codon usage in classical swine fever virus. Virus Genes 2009, 38, 104–112. [Google Scholar] [CrossRef]
  53. Cao, T.-Z.; Li, Q.; Huang, Y.-X.; Li, A.-S. plotnineSeqSuite: A Python package for visualizing sequence data using ggplot2 style. BMC Genom. 2023, 24, 585. [Google Scholar] [CrossRef]
  54. Sueoka, N. Intrastrand parity rules of DNA base composition and usage biases of synonymous codons. J. Mol. Evol. 1995, 40, 318–325. [Google Scholar] [CrossRef]
  55. Sueoka, N. Translation-coupled violation of Parity Rule 2 in human genes is not the cause of heterogeneity of the DNA G+C content of third codon position. Gene 1999, 238, 53–58. [Google Scholar] [CrossRef]
  56. Sueoka, N. Near homogeneity of PR2-bias fingerprints in the human genome and their implications in phylogenetic analyses. J. Mol. Evol. 2001, 53, 469–476. [Google Scholar] [CrossRef]
  57. Guo, Y.-Y.; Yang, J.-X.; Bai, M.-Z.; Zhang, G.-Q.; Liu, Z.-J. The chloroplast genome evolution of Venus slipper (Paphiopedilum): IR expansion, SSC contraction, and highly rearranged SSC regions. BMC Plant Biol. 2021, 21, 248. [Google Scholar] [CrossRef]
  58. Fu, J.-M.; Liu, H.-M.; Hu, J.-J.; Liang, Y.-Q.; Liang, J.-J.; Wuyun, T.; Tan, X.-F. Five complete chloroplast genome sequences from Diospyros: Genome organization and comparative analysis. PLoS ONE 2016, 11, e015956. [Google Scholar] [CrossRef] [PubMed]
  59. Li, W.-Q.; Liu, Y.-L.; Yang, Y.; Xie, X.-M.; Lu, Y.-Z.; Yang, Z.-R.; Jin, X.-B.; Dong, W.-P.; Suo, Z.-L. Interspecific chloroplast genome sequence diversity and genomic resources in Diospyros. BMC Plant Biol. 2018, 18, 210. [Google Scholar] [CrossRef]
  60. Liu, W.-W.; Tan, X.-H.; Zhao, K.-K.; Zhu, Z.-X.; Wang, H.-F. Complete plastome sequences of Diospyros maclurei Merr. and Diospyros hainanensis Merr. (Ebenaceae): Two endemic species in Hainan Province, China. Mitochondrial DNA B 2018, 3, 1205–1207. [Google Scholar] [CrossRef]
  61. Pop, C.; Rouskin, S.; Ingolia, N.T.; Han, L.; Phizicky, E.M.; Weissman, J.S.; Koller, D. Causal signals between codon bias, mRNA structure, and the efficiency of translation and elongation. Mol. Syst. Biol. 2014, 10, 770. [Google Scholar] [CrossRef]
  62. Parvathy, S.T.; Udayasuriyan, V.; Bhadana, V. Codon usage bias. Mol. Biol. Rep. 2022, 49, 539–565. [Google Scholar] [CrossRef]
  63. Hu, H.; Dong, B.-R.; Fan, X.-J.; Wang, M.-X.; Wang, T.-Z.; Liu, Q.-P. Mutational bias and natural selection driving the synonymous codon usage of single-exon genes in rice (Oryza sativa L.). Rice 2023, 16, 11. [Google Scholar] [CrossRef]
  64. Chen, J.; Ma, W.-Q.; Hu, X.-W.; Zhou, K.-B. Synonymous codon usage bias in the chloroplast genomes of 13 oil-tea Camellia samples from south China. Forests 2023, 14, 794. [Google Scholar] [CrossRef]
  65. Jiang, H.; He, S.-L.; He, J.; Zuo, Y.-J.; Guan, W.-L.; Zhao, Y.; Li, X.-J.; Meng, J. Plastid genomic features and phylogenetic placement in Rosa (Rosaceae) through comparative analysis. BMC Plant Biol. 2025, 25, 752. [Google Scholar] [CrossRef]
  66. Zeng, Y.-J.; Shen, L.-W.; Chen, S.-Q.; Qu, S.; Hou, N. Codon usage profiling of chloroplast genome in Juglandaceae. Forests 2023, 14, 378. [Google Scholar] [CrossRef]
  67. Campbell, W.H.; Gowri, G. Codon usage in higher plants, green algae, and cyanobacteria. Plant Physiol. 1990, 92, 1–11. [Google Scholar] [CrossRef]
  68. Wang, Z.-J.; Xu, B.-B.; Li, B.; Zhou, Q.-Q.; Wang, G.-Y.; Jiang, X.-Z.; Wang, C.-C.; Xu, Z.-D. Comparative analysis of codon usage patterns in chloroplast genomes of six Euphorbiaceae species. Peer J. 2020, 8, e8251. [Google Scholar] [CrossRef]
  69. Sharp, P.M.; Elizabeth, C.; Desmond, G.H.; Shields, D.C.; Kenneth, H.W.; Wright, F. Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within species diversity. Nucleic Acids Res. 1988, 16, 8207–8211. [Google Scholar] [CrossRef] [PubMed]
  70. Sharp, P.M.; Emery, L.R.; Zeng, K. Forces that influence the evolution of codon bias. Philos. T. R. Soc. B 2010, 365, 1203–1212. [Google Scholar] [CrossRef]
  71. Zhou, J.-H.; Ding, Y.-Z.; He, Y.; Chu, Y.-F.; Zhao, P.; Ma, L.-Y.; Wang, X.-J.; Li, X.-R.; Liu, Y.-S. The effect of multiple evolutionary selections on synonymous codon usage of genes in the Mycoplasma bovis genome. PLoS ONE 2014, 9, e108949. [Google Scholar] [CrossRef]
  72. Ivanova, Z.; Sablok, G.; Daskalova, E.; Zahmanova, G.; Apostolova, E.; Yahubyan, G.; Baev, V. Chloroplast genome analysis of resurrection tertiary relict Haberlea rhodopensis highlights genes important for desiccation stress response. Front. Plant Sci. 2017, 8, 204. [Google Scholar] [CrossRef]
  73. Gao, C.-M.; Deng, Y.-F.; Wang, J. The complete chloroplast genomes of Echinacanthus species (Acanthaceae): Phylogenetic relationships, adaptive evolution, and screening of molecular markers. Front. Plant Sci. 2019, 9, 1989. [Google Scholar] [CrossRef]
  74. Wang, Z.-J.; Cai, Q.-W.; Wang, Y.; Li, M.-H.; Wang, C.-C.; Wang, Z.-X.; Jiao, C.-Y.; Xu, C.-C.; Wang, H.-Y.; Zhang, Z.-L. Comparative analysis of codon bias in the chloroplast genomes of Theaceae species. Front. Genet. 2022, 13, 824610. [Google Scholar] [CrossRef]
Figure 1. Type and number of SSR in chloroplast genomes of 15 Diospyros species. (A) The total numbers of SSRs in 15 Diospyros species. (B) The numbers of different SSRs types in 15 Diospyros species. (C) The numbers of the repeat sequences in 15 Diospyros species. (F: Forward repeat; R: Reverse repeat; P: Palindromic repeat; C: Complement repeat). (D,E) Analysis of the correlation between genome length and IR length/SSR numbers.
Figure 1. Type and number of SSR in chloroplast genomes of 15 Diospyros species. (A) The total numbers of SSRs in 15 Diospyros species. (B) The numbers of different SSRs types in 15 Diospyros species. (C) The numbers of the repeat sequences in 15 Diospyros species. (F: Forward repeat; R: Reverse repeat; P: Palindromic repeat; C: Complement repeat). (D,E) Analysis of the correlation between genome length and IR length/SSR numbers.
Biology 14 01568 g001
Figure 2. Comparison of LSC, SSC, and IR boundaries revealed both conserved and variable features among the 15 Diospyros chloroplast genomes. JLB, JSB, JSA, and JLA denote connection points of adjacent regions. The arrow marks the distance from the gene to the boundary.
Figure 2. Comparison of LSC, SSC, and IR boundaries revealed both conserved and variable features among the 15 Diospyros chloroplast genomes. JLB, JSB, JSA, and JLA denote connection points of adjacent regions. The arrow marks the distance from the gene to the boundary.
Biology 14 01568 g002
Figure 3. Nucleotide diversity (Pi) analysis of 15 Diospyros chloroplast genomes. Top six hypervariable regions of the two datasets were annotated, respectively.
Figure 3. Nucleotide diversity (Pi) analysis of 15 Diospyros chloroplast genomes. Top six hypervariable regions of the two datasets were annotated, respectively.
Biology 14 01568 g003
Figure 4. A dating tree based on the complete chloroplast genomes sequences of 15 Diospyros species. The numbers on branch represented the supporting values of the Bayesian inference (BI) and maximum parsimony (MP) methods. The following sequences were used: D. cathayensis, MF288576; D. dumetorum, MF179487; D. eriantha, NC_081462; D. ferrea, MG049698; D. glaucifolia, NC_030784; D. hainanensis, NC_042160; D. kaki, NC_030789; D. lotus, NC_030786; D. maclurei, NC_042161; D. morrisiana, NC_081461; D. oleifera, NC_030787; D. rhombifolia, NC_039556; D. strigosa, OP480009; D. sutchuensis, NC_067511; D. tsangii, PX413321; D. vaccinioides, NC_060861. The outgroup was marked in light gray.
Figure 4. A dating tree based on the complete chloroplast genomes sequences of 15 Diospyros species. The numbers on branch represented the supporting values of the Bayesian inference (BI) and maximum parsimony (MP) methods. The following sequences were used: D. cathayensis, MF288576; D. dumetorum, MF179487; D. eriantha, NC_081462; D. ferrea, MG049698; D. glaucifolia, NC_030784; D. hainanensis, NC_042160; D. kaki, NC_030789; D. lotus, NC_030786; D. maclurei, NC_042161; D. morrisiana, NC_081461; D. oleifera, NC_030787; D. rhombifolia, NC_039556; D. strigosa, OP480009; D. sutchuensis, NC_067511; D. tsangii, PX413321; D. vaccinioides, NC_060861. The outgroup was marked in light gray.
Biology 14 01568 g004
Figure 5. Pearson correlation analysis of GC content, ENC, and codon number across 15 species within the genus Diospyros. Asterisks denote statistical significance at p < 0.01, while dots indicate significance at p < 0.05.
Figure 5. Pearson correlation analysis of GC content, ENC, and codon number across 15 species within the genus Diospyros. Asterisks denote statistical significance at p < 0.01, while dots indicate significance at p < 0.05.
Biology 14 01568 g005
Figure 6. The optimal codons in 15 species chloroplast genomes within the genus Diospyros. The 9 codons within the middle circle are the optimal codons shared by 15 species. Green represents codons ending with G or C, while red and purple represent codons ending with A or U.
Figure 6. The optimal codons in 15 species chloroplast genomes within the genus Diospyros. The 9 codons within the middle circle are the optimal codons shared by 15 species. Green represents codons ending with G or C, while red and purple represent codons ending with A or U.
Biology 14 01568 g006
Table 1. The complete chloroplast genome features of 15 Diospyros species.
Table 1. The complete chloroplast genome features of 15 Diospyros species.
SpeciesID No.Genome
Size
(bp)
LSC
Length
(bp)
SSC
Length
(bp)
IR
Length
(bp)
Gene
Content
PCGstRNA
Genes
rRNA
Genes
GC%
D. oleiferaNC030787157,72487,05418,52226,0741328737837.4
D. tsangiiPX413321157,44586,74418,52326,0891328737837.4
D. kakiNC030789157,78487,10918,53626,0681328737837.4
D. vaccinioidesNC060861157,77887,06618,53426,0891328737837.4
D. glaucifoliaNC030784157,59386,97418,41326,1031328737837.4
D. lotusNC030786157,59086,94418,41626,1151328737837.4
D. morrisianaNC081461157,73787,10418,45526,0891328737837.4
D. maclureiNC042161157,94687,38718,39726,0811328737837.4
D. hainanensisNC042160157,99987,52318,32226,0771328737837.4
D. strigosaOP480009157,37187,15618,46725,8741328737837.4
D. erianthaNC081462157,43287,18118,47125,8901328737837.4
D. dumetorumMF179487157,83486,99518,47926,1801328737837.4
D. rhombifoliaNC039556157,36887,23318,32525,9101328737837.4
D. cathayensisMF288576157,68987,17618,34926,0821328737837.4
D. sutchuensisNC067511157,91787,30318,39226,1111328737837.4
Table 2. Basic parameters of codon usage bias of chloroplast genome in Diospyros.
Table 2. Basic parameters of codon usage bias of chloroplast genome in Diospyros.
SpeciesCodon No.GC1GC2GC3GC_allGC3sENCAVGENCMIN−ENCMAXGenes
with ENC ≤ 35
D. oleifera20,9110.46970.39580.27880.38150.280145.2533.81 (rps18)−54.25 (ycf3)rps18, rpl16
D. tsangii20,9110.46980.39580.27910.38160.280545.2733.81 (rps18)−54.70 (ycf3)rps18, rpl16
D. kaki20,9130.46990.39580.27880.38150.279945.1933.75 (rpl16)−54.25 (ycf3)rpl16, rps18
D. vaccinioides20,9210.46970.39500.27780.38080.278945.1733.75 (rpl16)−54.25 (ycf3)rpl16, rps18
D. glaucifolia20,9210.46950.39460.27800.38070.279245.3832.37 (rps18)−54.69 (ycf3)rps18, rpl16
D. lotus20,9130.46970.39540.27900.38140.280345.4332.37 (rps18)−54.69 (ycf3)rps18, rpl16
D. morrisiana20,9210.47000.39440.27700.38050.278245.3133.81 (rps18)−54.89 (ycf3)rps18, rpl16
D. maclurei20,2190.46970.39580.27850.38130.279645.3833.40 (rps18)−54.25 (ycf3)rps18, rpl16
D. hainanensis20,8570.47090.39580.27870.38180.280045.3134.24 (rpl16)−53.75 (ycf3)rpl16, rps18, rps14
D. strigosa20,9570.47050.39490.27730.38090.278445.1333.81 (rps18)−54.25 (ycf3)rps18, rps14, rpl16
D. eriantha20,9580.47050.39480.27750.38090.278645.1833.81 (rps18)−54.25 (ycf3)rps18, rps14, rpl16
D. dumetorum20,9210.47030.39530.27880.38140.280245.3634.47 (rps18)−54.25 (ycf3)rps18, rps14, rpl16
D. rhombifolia20,9120.47060.39570.27840.38160.279645.2133.81 (rps18)−53.66 (ycf3)rps18
D. cathayensis20,9050.47060.39570.27900.38180.280245.3133.81 (rps18)−53.66 (ycf3)rps18
D. sutchuensis20,8860.47060.39540.27850.38150.279745.2633.81 (rps18)−53.66 (ycf3)rps18
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Li, Z. Evolutionary Dynamics of Chloroplast Genome and Codon Usage in the Genus Diospyros (Ebenaceae). Biology 2025, 14, 1568. https://doi.org/10.3390/biology14111568

AMA Style

Zhang J, Li Z. Evolutionary Dynamics of Chloroplast Genome and Codon Usage in the Genus Diospyros (Ebenaceae). Biology. 2025; 14(11):1568. https://doi.org/10.3390/biology14111568

Chicago/Turabian Style

Zhang, Jisi, and Zhuo Li. 2025. "Evolutionary Dynamics of Chloroplast Genome and Codon Usage in the Genus Diospyros (Ebenaceae)" Biology 14, no. 11: 1568. https://doi.org/10.3390/biology14111568

APA Style

Zhang, J., & Li, Z. (2025). Evolutionary Dynamics of Chloroplast Genome and Codon Usage in the Genus Diospyros (Ebenaceae). Biology, 14(11), 1568. https://doi.org/10.3390/biology14111568

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop