Ten Plastomes of Crassula (Crassulaceae) and Phylogenetic Implications

Simple Summary Plastids are semi-autonomous plant organelles which play critical roles in photosynthesis, stress response, and storage. The plastid genomes (plastomes) in angiosperms are relatively conserved in quadripartite structure, but variable in size, gene content, and evolutionary rates of genes. The genus Crassula L. is the second-largest genus in the family Crassulaceae J.St.-Hil, that significantly contributes to the diversity of Crassulaceae. However, few studies have focused on the evolution of plastomes within Crassula. In the present study, we sequenced ten plastomes of Crassula: C. alstonii Marloth, C. columella Marloth & Schönland, C. dejecta Jacq., C. deltoidei Thunb., C. expansa subsp. fragilis (Baker) Toelken, C. mesembrianthemopsis Dinter, C. mesembryanthoides (Haw.) D.Dietr., C. socialis Schönland, C. tecta Thunb., and C. volkensii Engl. Through comparative studies, we found Crassula plastomes have unique codon usage and aversion patterns within Crassulaceae. In addition, genomic features, evolutionary rates, and phylogenetic implications were analyzed using plastome data. Our findings will not only reveal new insights into the plastome evolution of Crassulaceae, but also provide potential molecular markers for DNA barcoding. Abstract The genus Crassula is the second-largest genus in the family Crassulaceae, with about 200 species. As an acknowledged super-barcode, plastomes have been extensively utilized for plant evolutionary studies. Here, we first report 10 new plastomes of Crassula. We further focused on the structural characterizations, codon usage, aversion patterns, and evolutionary rates of plastomes. The IR junction patterns—IRb had 110 bp expansion to rps19—were conservative among Crassula species. Interestingly, we found the codon usage patterns of matK gene in Crassula species are unique among Crassulaceae species with elevated ENC values. Furthermore, subgenus Crassula species have specific GC-biases in the matK gene. In addition, the codon aversion motifs from matK, pafI, and rpl22 contained phylogenetic implications within Crassula. The evolutionary rates analyses indicated all plastid genes of Crassulaceae were under the purifying selection. Among plastid genes, ycf1 and ycf2 were the most rapidly evolving genes, whereas psaC was the most conserved gene. Additionally, our phylogenetic analyses strongly supported that Crassula is sister to all other Crassulaceae species. Our findings will be useful for further evolutionary studies within the Crassula and Crassulaceae.


Nucleotide Substitution Rate Analyses
The 79 PCGs from 87 species of Crassulaceae were employed to evaluate the evolutionary rates (Table S1). The percentage of variable sites (PV) and average π values were measured with DnaSP v6.12 (Departament de Genètica, Universitat de Barcelona, Barcelona, Spain) [46]. The nucleotide substitution rates, including dN, dS, and dN/dS, were inferred with PAML v4.9 [55] under F3X4 and M0 model.

Plastome Organizations and Structural Features
Furthermore, based on the results obtained with mVISTA, in all plastomes investigated it was found that the IR and coding regions (exons, tRNAs, and rRNAs) are more conserved than SC and conserved non-coding regions (CNS), respectively ( Figure 2). Additionally, the results also revealed that 3 plastomes (labelled 8-10) of subgenus Disporocarpa exhibited higher divergences than 7 plastomes (labelled 1-7) of subgenus Crassula, when compared with the reference.    Furthermore, based on the results obtained with mVISTA, in all plastomes investigated it was found that the IR and coding regions (exons, tRNAs, and rRNAs) are more conserved than SC and conserved non-coding regions (CNS), respectively ( Figure 2). Additionally, the results also revealed that 3 plastomes (labelled 8−10) of subgenus Disporocarpa exhibited higher divergences than 7 plastomes (labelled 1−7) of subgenus Crassula, when compared with the reference.  Y-scale represents the percent identity between 50% and 100%. The labels 0 to 10 indicate C. perforata (reference), C. alstonii, C. columella, C. dejecta, C. mesembryanthoides, C. tecta, C. mesembrianthemopsis, C. socialis, C. volkensii, C. expansa subsp. fragilis, and C. deltoidei, respectively.
The sliding-window-based π values estimated for 11 plastomes of Crassula ranged from 0.00073 to 0.10315 (Table S2 and Data S2). The mean π value and its standard deviation were 0.02978 and 0.01954, respectively. Thus, a total of 11 HVRs were identified with relatively high variability (π > 0.06886) ( Figure 3). These HVRs containing high π values (0.06912-0.08653) and abundant variable sites (111-559) might be used as potential DNA barcodes for species identification within Crassula ( Table 2).
The sliding-window-based π values estimated for 11 plastomes of Crassula ranged from 0.00073 to 0.10315 (Table S2 and Data S2). The mean π value and its standard deviation were 0.02978 and 0.01954, respectively. Thus, a total of 11 HVRs were identified with relatively high variability (π > 0.06886) (Figure 3). These HVRs containing high π values (0.06912−0.08653) and abundant variable sites (111−559) might be used as potential DNA barcodes for species identification within Crassula (Table 2).   Regions with higher π values (π > 0.06886) were considered as HVRs. In our current study, all 11 plastomes of Crassula displayed similar IR junction patterns ( Figure 4). The SSC/IRa borders are located in the coding regions of ycf1 gene, resulting in the fragmentations of ycf1 (ycf1-fragment) in IRb regions. Moreover, ndhF genes were discovered to occur mainly in SSC, and partly in IRb, regions. Notably, rps19 genes are located at the LSC/IRb junctions, with extension into the IRb regions for 110 bp. Similarly, trnH genes lie at the IRa/LSC junctions, with uniform 3 bp-sized expansions to the IRa regions.
In our current study, all 11 plastomes of Crassula displayed similar IR junction patterns ( Figure 4). The SSC/IRa borders are located in the coding regions of ycf1 gene, resulting in the fragmentations of ycf1 (ycf1-fragment) in IRb regions. Moreover, ndhF genes were discovered to occur mainly in SSC, and partly in IRb, regions. Notably, rps19 genes are located at the LSC/IRb junctions, with extension into the IRb regions for 110 bp. Similarly, trnH genes lie at the IRa/LSC junctions, with uniform 3 bp-sized expansions to the IRa regions. Blue, orange and green blocks represent the LSC, IR and SSC regions, respectively. Gene boxes represented above the block were transcribed clockwise and those represented below the block were transcribed clockwise. "fra." is the abbreviation of "fragment".

Codon Usage and Aversion Patterns
To compare the patterns of codon usage and aversion between Crassula and other Crassulaceae species, four analyses (RSCU, ENC, PR2-plot, and codon aversion motif) of 53 plastid genes (length ≥ 300 bp) were performed.
The overall RSCU values ranged from 0.32 (CTC or AGC) to 2.07 (TTA) among Crassulaceae species (Table S3). Similar with other Crassulaceae species, seven taxa of Crassula exhibited significant preference for A/T-ending codons over G/C-ending codons in plastid genes ( Figure 5). Importantly, the RSCU heatmap showed two subgenera within the Blue, orange and green blocks represent the LSC, IR and SSC regions, respectively. Gene boxes represented above the block were transcribed clockwise and those represented below the block were transcribed clockwise. "fra." is the abbreviation of "fragment".

Codon Usage and Aversion Patterns
To compare the patterns of codon usage and aversion between Crassula and other Crassulaceae species, four analyses (RSCU, ENC, PR2-plot, and codon aversion motif) of 53 plastid genes (length ≥300 bp) were performed.
The overall RSCU values ranged from 0.32 (CTC or AGC) to 2.07 (TTA) among Crassulaceae species (Table S3). Similar with other Crassulaceae species, seven taxa of Crassula exhibited significant preference for A/T-ending codons over G/C-ending codons in plastid genes ( Figure 5). Importantly, the RSCU heatmap showed two subgenera within the Crassula: subgenus Disporocarpa included C. expansa subsp. fragilis, C. deltoidea and C. volkensii; subgenus Crassula consisted of the remaining eight taxa ( Figure 5).
The ENC values ranged from 30.83 (ndhC in Sedum sarmentosum Bunge) to 57.74 (ndhJ in C. volkensii and C. expansa subsp. fragilis) among Crassulaceae species (Table S4). Generally, ENC values ≤35 indicate high codon preference [52,61,62]. The results show that most of the ENC values (99.48%) were higher than 35, indicating a weaker bias. Most surprisingly of all, we detected the ENC values of matK, from the Crassula clade, are significantly higher than those of all other clades (Table S4 and Figure 6). It might prove to be a unique feature for Crassula species. To further verify this finding, more sampling data and comprehensive analyses are need in future studies.  Figure 5).  erally, ENC values ≤ 35 indicate high codon preference [52,61,62]. The results show that most of the ENC values (99.48%) were higher than 35, indicating a weaker bias. Most surprisingly of all, we detected the ENC values of matK, from the Crassula clade, are significantly higher than those of all other clades (Table S4 and Figure 6). It might prove to be a unique feature for Crassula species. To further verify this finding, more sampling data and comprehensive analyses are need in future studies. The PR2 plots of matK and 52 other PCGs are presented in Figures 7 and S1, respectively. These results indicated the nucleotide usage at the 3rd codon site of 4-fold degenerate codons is uneven in different genes. For example, rps14, clpP, psbA, and pafII prefer to use A/G, A/C, T/C, and T/G in 4-fold degenerate sites, respectively ( Figure S1). In addition, these unbalanced utilizations were also found in different species ( Figure S1). Obviously divergent GC-biases were observed in matK genes between species of subgenus Crassula and others. Specifically, all GC-biases of clades from Kalanchoideae and Sempervivoideae, plus subgenus Disporocarpa, were less than 0.5. On the contrary, all these values for the subgenus Crassula were higher than 0.5, which might be unique characteristic for subgenus Crassula. Moreover, species with close relationships had identical nucleotide biases. For example, C. alstonii and C. columella had identical AT-biases (0.4074) and GCbiases (0.5455). Similar phenomena could also be observed in C. mesembryanthoides and C. tecta (AT-biases = 0.4286, and GC-biases = 0.5455). The PR2 plots of matK and 52 other PCGs are presented in Figures 7 and S1, respectively. These results indicated the nucleotide usage at the 3rd codon site of 4-fold degenerate codons is uneven in different genes. For example, rps14, clpP, psbA, and pafII prefer to use A/G, A/C, T/C, and T/G in 4-fold degenerate sites, respectively ( Figure S1). In addition, these unbalanced utilizations were also found in different species ( Figure S1). Obviously divergent GC-biases were observed in matK genes between species of subgenus Crassula and others. Specifically, all GC-biases of clades from Kalanchoideae and Sempervivoideae, plus subgenus Disporocarpa, were less than 0.5. On the contrary, all these values for the subgenus Crassula were higher than 0.5, which might be unique characteristic for subgenus Crassula. Moreover, species with close relationships had identical nucleotide biases. For example, C. alstonii and C. columella had identical AT-biases (0.4074) and GC-biases (0.5455). Similar phenomena could also be observed in C. mesembryanthoides and C. tecta (AT-biases = 0.4286, and GC-biases = 0.5455).
Owing to the codon aversion motifs containing phylogenetic implication, we analyzed codon aversion patterns of genes among Crassulaceae species. Except for rpoB, rpoC2, ycf1 and ycf2, codon aversion motifs were found in the remaining 49 genes (Table S5). It is worth noting that 27 and 16 unique codon aversion motifs were detected for species of subgenus Crassula and subgenus Disporocarpa, respectively (Table 3), which might be used as potential biomarkers for species identification. Further to this, 8 consensus motifs might be considered as the feature of genus Crassula (Table 3). Moreover, the codon aversion motifs from 3 genes (matK, pafI and rpl22) could also divide 11 species into two subgenera (subgenus Crassula and subgenus Disporocarpa) (Figure 8), which is congruent with results from RSCU heatmap. Owing to the codon aversion motifs containing phylogenetic implication, we analyzed codon aversion patterns of genes among Crassulaceae species. Except for rpoB, rpoC2, ycf1 and ycf2, codon aversion motifs were found in the remaining 49 genes (Table  S5). It is worth noting that 27 and 16 unique codon aversion motifs were detected for species of subgenus Crassula and subgenus Disporocarpa, respectively (Table 3), which might be used as potential biomarkers for species identification. Further to this, 8 consensus motifs might be considered as the feature of genus Crassula (Table 3). Moreover, the codon aversion motifs from 3 genes (matK, pafI and rpl22) could also divide 11 species into two subgenera (subgenus Crassula and subgenus Disporocarpa) (Figure 8), which is congruent with results from RSCU heatmap.

Evolutionary Rates and Patterns
The π (0.00447-0.0914) and PV (4.91-37.52%) values of 79 plastid PCGs of Crassulaceae species were plotted in Figure 9a. Two genes, referring to ycf1 (π = 0.0914, PV = 35.78%) and matK (π = 0.08239, PV = 37.52%), had obviously higher π and PV values than those of the other 77 genes, indicating they might accumulate more mutations than other plastid genes. The detailed data are listed in Table S6.

Evolutionary Rates and Patterns
The π (0.00447-0.0914) and PV (4.91-37.52%) values of 79 plastid PCGs of Crassulaceae species were plotted in Figure 9a. Two genes, referring to ycf1 (π = 0.0914, PV = 35.78%) and matK (π = 0.08239, PV = 37.52%), had obviously higher π and PV values than those of the other 77 genes, indicating they might accumulate more mutations than other plastid genes. The detailed data are listed in Table S6. To further quantify the evolutionary rates of PCGs, the nucleotide substitution rates, including dN, dS and dN/dS, were calculated ( Figure 9b, Table S6). The dN values ranged from 0 to 0.8671, with higher dN values for ycf1 (dN = 0.8671) and matK (dN = 0.7804) than for others. Compared with dN values, the dS values had relatively wide ranges (0.177-2.3917), resulting in corresponding dN/dS ratios (0-0.5891) of less than 1. This finding indicates the plastid genes from Crassulaceae appear to be evolving under a purifying selective constraint. Among 79 plastid PCGs, ycf2 is the most rapidly evolving gene, with the highest ratio (dN/dS = 0.5891), followed by ycf1, cemA, psaI, and matK. By contrast, psaC was the most conserved gene with the lowest ratio (dN/dS = 0).

Phylogenetic Implications
To investigate the evolutionary relationships among 87 species of Crassulaceae, phylogenetic analyses were performed. After a model test, GTR + G4 and GTR + I+G4 were inferred as the optimal substitution models for most genes (the detailed models can be seen in Table S7). As shown in Figure 10, the trees inferred from two methods displayed the same topology. To further quantify the evolutionary rates of PCGs, the nucleotide substitution rates, including dN, dS and dN/dS, were calculated ( Figure 9b, Table S6). The dN values ranged from 0 to 0.8671, with higher dN values for ycf1 (dN = 0.8671) and matK (dN = 0.7804) than for others. Compared with dN values, the dS values had relatively wide ranges (0.177-2.3917), resulting in corresponding dN/dS ratios (0-0.5891) of less than 1. This finding indicates the plastid genes from Crassulaceae appear to be evolving under a purifying selective constraint. Among 79 plastid PCGs, ycf2 is the most rapidly evolving gene, with the highest ratio (dN/dS = 0.5891), followed by ycf1, cemA, psaI, and matK. By contrast, psaC was the most conserved gene with the lowest ratio (dN/dS = 0).

Phylogenetic Implications
To investigate the evolutionary relationships among 87 species of Crassulaceae, phylogenetic analyses were performed. After a model test, GTR + G4 and GTR + I+G4 were inferred as the optimal substitution models for most genes (the detailed models can be seen in Table S7). As shown in Figure 10, the trees inferred from two methods displayed the same topology. Ten species of Crassula that we sequenced, together with C. perforate, form the wellsupported clade Crassula, which is sister to all other Crassulaceae species (maximum Ten species of Crassula that we sequenced, together with C. perforate, form the well-supported clade Crassula, which is sister to all other Crassulaceae species (maximum likelihood bootstrap [BS] = 100 and bayesian posterior probability [PP] = 1.00). In addition, our phylogenetic tree indicated that this monophyletic clade could be clustered into two subgenera: subgenus Disporocarpa harbored C. volkensii, C. expansa subsp.

Discussion
Ten new plastomes of Crassula were reported in the present study. Combined with available data from public database, we conducted comprehensive analyses, including plastome organizations, codon usage and aversion patterns, evolutionary rates, and phylogenetic implications.
The expansion and contraction of IR regions are common evolutionary events and have been considered as the main mechanism for the length variation of angiosperm plastomes [64][65][66]. In our study, we performed comparative analyses among Crassula plastomes, and found that the IRb regions had uniform length (110 bp) expansions to the rps19 gene. This 110-bp expansion had been also observed in Aeonium, Monanthes, and most other taxa of Crassulaceae in our recent study [17]. This finding indicated that the conserved IR organization might act as a family-specific marker for Crassulaceae species.
Interestingly, it was reported that rps19 genes were completely located in the LSC regions in Forsythia suspensa (Thunb.) Vahl, Olea europaea Hoffmanns. & Link L., and Quercus litseoides Dunn [67,68], and were fully encoded by the IR regions in Polystachya adansoniae Rchb.f., Polystachya bennettiana Rchb.f., and Dracaena cinnabari Balf.f. [69,70]. There are several mechanisms that might explain the IR expansion and contraction [71][72][73]. For instance, Goulding et al. [71] proposed that short IR expansions may occur by gene conversion events, whereas large IR expansions involved in double-strand DNA breaks. In order to better reveal the mechanisms of IR expansion and contraction, more extensive investigations in Crassulaceae and Saxifragales are required.
Investigations of codon usage patterns could reveal phylogenetic relationships between organisms [25,74]. In particular, 11 species of Crassula can be divided into two subgenera from the RSCU heatmap, which agreed with the results of phylogenetic analyses. This finding further demonstrates that RSCU values contain phylogenetic implications [75][76][77][78][79][80]. Additionally, we observed codon usage patterns are gene-specific and/or species-specific, reflected in diversified ENC values and various distribution patterns in PR2 plots. Interestingly, we found the codon usage patterns of matK gene in Crassula species are unique among Crassulaceae species with elevated ENC values. Furthermore, the GC-biases of matK gene with specific preference (>0.5) might be the particular feature for subgenus Crassula. Due to rapid evolutionary rate, high universality, and significant interspecific divergence, the matK gene has been broadly used in plant evolutionary studies as one of the core DNA barcodes [9,10,[81][82][83][84].
Codon aversion, a novel concept proposed by Miller et al. [29][30][31], is an informative character in phylogenetics. Specifically, the codon aversion motifs in orthologous genes are generally conserved in specific lineages [29][30][31]. To date, these analyses have only been performed in a few plant plastomes [17,26]. For example, the specific codon aversion motifs of rpoA gene could distinguish not only the two genera (Aeonium and Monanthes), but also the three subclades of Aeonium in our recent report [17]. In this work, genus-specific and subgenus-specific codon aversion motifs were identified for 11 Crassula species. These findings suggest codon aversion pattern could be used as a promising tool for phylogenetic study.
Generally, the dN/dS ratios of genes could reflect the extent of selection pressures during evolution [22]. Here, the dN/dS values of plastid PCGs ranged from 0 to 0.5891 within Crassulaceae, indicating all plastid genes were under purifying selection. Among these values, elevated dN/dS ratios were found for ycf1 (0.4349) and ycf2 (0.5891). Similarly, high dN/dS ratios of these two genes were also observed in other families, such as Asteraceae Bercht. & J.Presl [38], Mazaceae Reveal [22], and Musaceae Juss. [13]. The ycf1 gene was related to protein translocation [85]. The ycf2 gene is necessary for cell viability, but the detail function is still unknown [86]. Why ycf1 and ycf2 evolve relatively fast is interesting. The possible reason put forwarded by Barnard-Kubow et al. [87] considered that relaxed purifying selection or positive selection on ycf1, ycf2 and some other genes might result in the development of reproductive isolation and subsequent speciation in plants. Therefore, the results suggested that ycf1 and ycf2 might play important roles in the divergence of Crassulaceae.
Our phylogenetic tree divided 87 species into 3 subfamilies and 7 clades. The clade Crassula is sister to all other 6 clades, which agrees with the phylogeny reported by Gontcharova et al. [4], Chang et al. [6], and Han et al. [17]. Furtherly, 11 Crassula species could be furtherly divided into two subgenera, which generally accords with the morphological differences (floral shape) reported by Bruyns et al. [10] (Table S8). Nevertheless, there are still some unsolved phylogenetic problems within Crassulaceae. The first problem is that the plastid phylogeny of Crassula is not entirely clear due to the limited data. According to the classification proposed by Tölken [11,88], 11 and 9 sections were respectively identified in subgenus Crassula and subgenus Disporocarpa. However, Bruyns et al. [10] indicated that most sections were not monophyletic. Moreover, subgenus Disporocarpa recently has been regarded as a paraphyletic group [9,10]. The second is the genus Sedum, which is not monophyletic in our study, agreeing with the widely accepted viewpoint [3][4][5]89,90]. Finally, the genus Orostachys has been demonstrated to be non-monophyletic based on plastid data, which is consistent with previous analysis based on nuclear internal transcribed spacers (ITS) data [63]. In order to better understand the phylogeny of Crassula or Crassulaceae, more data are needed for the further detailed analyses.

Conclusions
In the present study, 10 new plastomes of Crassula species were reported. These plastomes exhibited identical gene content and order, and that they contained 134 genes (130 functional gene and 4 pseudogenes). The 11 identified HVRs with relatively high variability (π > 0.06886) might be used as potential DNA barcodes for species identification within Crassula. The unique expansion pattern, where the IRb regions had uniform length (110 bp) boundary expansions to rps19, might become a plesiomorphy of Crassulaceae. According to RSCU values, the A/T-ending codons were favored in plastid genes. Most importantly, we found the codon usage patterns of the matK gene in Crassula species are unique among Crassulaceae species with elevated ENC values. Furthermore, subgenus Crassula species have specific GC-biases in the matK gene. In addition, the codon aversion motifs from matK, pafI and rpl22 contained phylogenetic implications within Crassula. Compared with other Crassulaceae species, 27 and 16 unique codon aversion motifs were detected for subgenus Crassula and subgenus Disporocarpa, respectively. Additionally, the evolutionary rates analyses indicated all plastid genes of Crassulaceae were under purifying selection. Among these genes, ycf1 (dN/dS = 0.4349) and ycf2 (dN/dS = 0.5891) were the most rapidly evolving genes, whereas psaC (dN/dS = 0) was the most conserved gene. Finally, our phylogenetic analyses strongly supported Crassula is sister to all other Crassulaceae species. Our results will be benefit for further evolutionary studies within the Crassula and Crassulaceae.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The sequence data generated in this study are available in GenBank of the National Center for Biotechnology Information (NCBI) under the access numbers: OP729482-OP729487 and OP882297-OP882300.