Characterization of the Complete Mitochondrial Genome of Wintersweet (Chimonanthus praecox) and Comparative Analysis within Magnoliids

Mitochondrial genome sequencing is a valuable tool for investigating mitogenome evolution, species phylogeny, and population genetics. Chimonanthus praecox (L.) Link, also known as “La Mei” in Chinese, is a famous ornamental and medical shrub belonging to the order Laurales of the Calycanthaceae family. Although the nuclear genomes and chloroplast genomes of certain Laurales representatives, such as Lindera glauca, Laurus nobilis, and Piper nigrum, have been sequenced, the mitochondrial genome of Laurales members remains unknown. Here, we reported the first complete mitogenome of C. praecox. The mitogenome was 972,347 bp in length and comprised 60 unique coding genes, including 40 protein-coding genes (PCGs), 17 tRNA genes, and three rRNA genes. The skewness of the PCGs showed that the AT skew (−0.0096233) was negative, while the GC skew (0.031656) was positive, indicating higher contents of T’s and G’s in the mitochondrial genome of C. praecox. The Ka/Ks ratio analysis showed that the Ka/Ks values of most genes were less than one, suggesting that these genes were under purifying selection. Furthermore, there is a substantial abundance of dispersed repeats in C. praecox, constituting 16.98% of the total mitochondrial genome. A total of 731 SSR repeats were identified in the mitogenome, the highest number among the eleven available magnoliids mitogenomes. The mitochondrial phylogenetic analysis based on 29 conserved PCGs placed the C. praecox in Lauraceae, and supported the sister relationship of Laurales with Magnoliales, which was congruent with the nuclear genome evidence. The present study enriches the mitogenome data of C. praecox and promotes further studies on phylogeny and plastid evolution.


Introduction
Chimonanthus praecox (L.) Link.1822 (commonly known as wintersweet or "La Mei" in Chinese) is a perennial deciduous shrub of the Calycanthaceae family [1].It is an essential ornamental plant native to China and widely cultivated in bonsai cultivation, landscaping, and cut flower production because of its distinctive fragrant aroma and blossoming time (from late November to the following March).Thus, wintersweet is planted in almost all Chinese gardens and green spaces.It has also been widely utilized for centuries for its medicinal uses for treating rheumatism, measles, cough, and heatstroke, especially in traditional Chinese medicine [2,3].Modern pharmacological research has found that wintersweet exhibits multiple therapeutic activities, such as antitumor, antiviral, and immunomodulatory [4,5].
C. praecox belongs to the Laurales, a part of Magnoliidae (magnoliids), also including the orders Piperales, Canellales, and Magnoliales [6,7].Magnoliidae are the third-largest group of Mesangiospermae, containing approximately 9000 species.Some magnoliids share anatomical similarities with gymnosperms, such as carpels, which are situated on the flowering axis and are considered to be the early diverging lineages from angiosperms [8].
The analysis of the evolutionary problems of magnoliids helps us understand the evolution of angiosperms, which makes them occupy a pivotal position in the research of plant evolution.Except for non-mesangiosperms (the ANA grade, i.e., Amborellales, Nymphaeales, and Austrobaileyales), magnoliids are the key nodes to reveal the evolution of angiosperm.An increasing number of magnoliids genomes (such as Cinnamomum kanehirae, Liriodendron chinense, Magnolia biondii, Chimonanthus praecox, Chimonanthus salicifolius, Aristolochia fimbriata, Piper nigrum, and Phoebe bournei) facilitate the studies of flowering plant evolution and the mechanism of many important traits.For the past few decades, scientists have been grappled with the angiosperms' origin and evolution problems [9][10][11].However, the evolutionary problems of magnoliids remain controversial.For example, the analysis of the recently released genome of magnoliids (Piper nigrum, Liriodendron chinense, Cinnamomum kanehirae, and Persea americana) results in two conflicting conclusions regarding their relationship with monocots and eudicots [12][13][14][15].Frequent lineage-specific whole genome duplication (WGD) events have occurred throughout the evolutionary history of magnoliids.For instance, two rounds of ancient WGD in the genome of Chimonanthus salicifolious, three successive rounds of lineage-specific WGDs in black pepper, single round of WGD in Liriodendron chinense, and two rounds of WGD in Cinnamomum kanehirae were observed [10, 16,17].This resulted in rapid gene gain and loss, pseudogenization, and purifying selection, complicating the evolutionary history of homologous genes [18].The rapid radiation and ancient lineage sorting hybridization observed in other angiosperms may partly explain the elusive problem of evolutionary issues in magnoliids [19][20][21].
Although more than 23 nuclear genomes from magnoliid species have been reported and great efforts have been taken to study the evolution of magnoliids using molecular data such as nuclear genome, plastid genome, or mitochondrial genome, the current understanding on magnoliids phylogeny is yet inconsistent [22,23].Thus, mitochondrial genomes, housing the oxidative phosphorylation machinery and many other essential metabolic pathways, may provide additional sophisticated evidence for the phylogenetics of magnoliids [24].Unfortunately, the mitochondrial genomes of many important taxa, such as Calycanthaceae, Monimiaceae, and Canellaceae, are not yet available.
As a vital ornamental and medicinal plant, wintersweet also plays a vital role in clarifying the evolution of angiosperms.The chromosome-level genome of C. praecox and the cultivar C. praecox cv.Concolor have been released, providing new insights into floral scent biosynthesis and flowering in winter [25][26][27].However, the mitochondrial genomes of C. praecox and many species in Laurales are yet to be elucidated.The mitochondrial genome conventionally shares the similarities of conserved maternal inheritance with the chloroplast genome, but it has more complex with various structural features.For example, the mitochondrial genome usually consists of a single double-stranded circular DNA in many plants, similar to the chloroplast genome in Arabidopsis thaliana and Cucurbita pepo, while linear DNA and multiple circular DNA molecules have also been found in rice, wheat, and cucumber [28][29][30][31][32]. Plant mitogenomes vary greatly in size, normally ranging from 66 Kb to more than 10 Mb [33].Meanwhile, they evolve rapidly in structure such that even close relatives have clear differences, such as in cotton [34].These peculiarities make their assembly tough.Currently, there are approximately 700 sequenced complete mitogenomes, which is substantially less than the number of plastomes [35].
In the present study, we assembled and annotated the mitogenome of C. praecox to analyze its gene content, repetitive sequence, codon usage, and synonymous (Ks) and nonsynonymous (Ka) substitution.Comparative analysis with the available mitogenomes of magnoliids was also performed.It will be a valuable resource for further studies on the mitochondrial genome evolution and phylogeny of angiosperm.

Plant Materials, Library Preparation, and Genome Sequence
The fresh young leaves of a single Chimonanthus praecox plant were collected from plants in Bailiang Town (34 • 09 ′ N, 114 • 16 ′ E), Yanling County, Henan Province, China.A voucher specimen was deposited at Xuchang University, Xuchang City, Henan Province, China.Total genomic DNA was extracted using a modified CTAB (cetyl trimethyl ammonium bromide) method [36] and quality controlled using agarose gel electrophoresis and Nanodrop 2000 Spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA).After quality testing, short-insert libraries within 350 bp were constructed using the standard manufacturer's PCR-free protocol and sequenced on the Illumina HiSeq2500 platform (Illumina, San Diego, CA, USA).

Codon Usage Analysis
Relative synonymous codon usage (RSCU) was the ratio of times a particular codon was observed to the expected frequency of all synonymous codons of the same amino acid.If RSCU = 1, it indicates that codon usage is unbiased.If RSCU < 1 or > 1, it indicates that the actual frequency of use of the codon is lower or higher than that of other synonymous codons, respectively [43].The mitogenome's total codon distribution (count) and RSCU were analyzed in MEGA11 [44] and CodonW (version 1.4.2,http://codonw.sourceforge.net/, accessed on 10 November 2023).All synonymous codons were identified to find RSCU values.A codon used less frequently than expected will have an RSCU value of less than one, and vice versa for a codon used more frequently than expected.

Synonymous and Nonsynonymous Substitution Ratio
The sequences of 27 shared PCGs of C. praecox and the other eight species were aligned separately using the MAFFT v7.520 [45].The nonsynonymous (Ka) and synonymous (Ks) substitution ratio (Ka/Ks) of the 27 mitochondrial PCGs were analyzed using DnaSP v6.12.03 to identify the genes that are under selection pressure [46].Statistical analysis of the Ka/Ks ratio of 27 protein-coding genes was performed with one-way ANOVA using R v4.1.2.

Mitogenome Structure, Organization, and Composition
A total of 4.03 gigabase raw reads were generated using Illumina HiSeq2500 (Illumina, San Diego, CA, USA).The primary de novo assembly used SPAdes and generated 2903 contigs within a total length of 11,168,911 bp (Supplementary Table S1).Among them, 2649 contigs have a length greater than 1000 bp, with a median length of 2412 bp and a maximum length of 79,783 bp (Supplementary Table S1).After manually removing the redundant fragments, the mitogenome of C. praecox was assembled into a single circular DNA molecule with a total length of 972,347 bp after the gap-filling step (Figure 1).The size is close to that of Magnolia biondii (NC_049134.1, 967,100 bp) but nearly triple the size of Aristolochia fimbriata (OP649454.1-QP649456.1,349,849 bp) (Supplementary Table S2).
The nucleotide composition of the mitogenome is A: 26.23%; T: 26.38%; C: 23.89%; and G: 23.51%.The AT and GC contents are 52.61% and 47.39%, respectively, which are similar to those of magnoliids species (Supplementary Table S3) [52].Both the AT skew (−0.002851169) and GC skew (−0.008016878) were negative, indicating higher T and C in the mitogenome of C. praecox.The mitogenome encodes 60 unique genes, including 40 PCGs, 3 rRNA, and 17 tRNA (Table 1).The whole mitogenome sequence of C. praecox has been deposited in the GenBank database with an accession number of OR811177.Despite the significant variation in mitochondrial genome size, gene set and GC content were consistent in magnoliids [24,52].

Genomic Features of the C. praecox Mitogenome
Among the PCGs of the C. praecox mitogenome, five genes (nad1, nad2, nad5, atp1, and rps19) were duplicated two or more times.Notably, the presence and number of introns within these PCGs varied significantly.The majority of PCGs lacked introns entirely, while eight genes contained introns.Specifically, five genes (nad1, nad2, ccmFc, rps3, and rps10) had only one intron each, while the cox2, nad4, and nad7 genes contained two, three, and four introns, respectively.
The combined length of all PCGs amounted to 35,827 bp, corresponding to approximately 3.7% of the total mitogenome length.The average length of PCGs was 1526 bp, spanning from 21 bp to 10,390 bp (Supplementary Table S2).The nucleotide composition of PCGs was A: 25.34%; T: 25.83%; G: 25.19%; and C: 23.64%, which resulted in an AT content of 51.17% and a CG content of 48.83%.The AT skew (−0.0096233) and GC skew (0.031656) indicated a higher T and G content than that of A and C among the PCGs.
Three ribosomal RNA genes and seventeen transfer RNA genes were identified in the mitogenome.The total length of rRNA genes was 9504 bp, which comprised the duplicated copies of rrnL and the single copies of rrn5 and rrnS (Table 1, Supplementary Table S1).Seventeen unique tRNA genes were identified in the mitogenome of C. praecox, and ten genes have multiple copies.Specifically, two copies were found in eight tRNA genes (trnD, trnE, trnF, trnK, trnP, trnQ, trnS, trnY), while trnL had three copies, and trnM had five copies in the mitogenome of C. praecox.These tRNA genes ranged from 67 bp to 104 bp, totaling 2323 bp (Supplementary Table S1).

Codon Usage Analysis
Most amino acids are encoded by 2-6 different codons except for tryptophan and methionine, and codon usage bias reflects the mutation patterns and evolution of genes [53].To better understand the mitogenome features of C. praecox, we performed a comparative analysis of the mitogenomes of C. praecox with those of other species, including Ginkgo biloba, Magnolia biondii, Liriodendron tulipifera, Nelumbo nucifera, Oryza sativa (japonica), Aconitum kusnezoffii, Aristolochia fimbriata, and Saururus chinensis [54][55][56][57].The genome size and number of PCGs varied among species.For instance, Magnolia biondii and Liriodendron tulipifera are magnoliids with similar amounts of PCGs, rRNA, and GC percentage (Supplementary Table S2).
In the mitogenome of C. praecox, a total of 11,894 codons were identified in 40 PCGs except for the termination codons, and the majority of PCGs initiated with ATG start codons with a few exceptions (nad1, nad2, nad4L, nad5, rps4, and rps10).Notably, five types of stop codons (CGG, AAA, TAT, GTA, and GGT) were found in the mitogenome of C. praecox in addition to the traditional stop codons (TAA, TAG, and TGA).The codon usage analysis revealed that arginine (Arg), serine (Ser), and leucine (Leu) are the most common amino acid residues.At the same time, tryptophan (Trp) is the least-used amino acid residue in the mitogenome (Figure 2, Supplementary Figure S1 and Table S3).
Comparing the mitogenome of C. praecox with the other eight species showed that the majority of amino acids are similar among all species except for alanine (Ala), cysteine (Cys), proline (Pro), methionine (Met), and Leu.The percentages of Ala, Met, and Pro in C. praecox were lower than those in the other species, while the proportion of Cys was more significant than those in the other species (Figure 3).
Relative synonymous codon usage (RSCU) analysis was conducted to assess codon usage bias.The RSCU analysis revealed that NNU and NNA were greater than one, with a few exceptions (AUA, CUA, AGU, and GUA).This pattern suggested a strong bias of A and T in the third position of codons in PCGs, which is also observed in the other plant mitogenomes [58,59].An adequate number of codon (ENC) analysis was also performed to understand the effect of codon usage.The ENC value of the PCGs ranged from 39.23 to 60.99, indicating a very weak codon usage bias in these genes (Supplementary Table S3).Comparing the mitogenome of C. praecox with the other eight species showed that the majority of amino acids are similar among all species except for alanine (Ala), cysteine (Cys), proline (Pro), methionine (Met), and Leu.The percentages of Ala, Met, and Pro in C. praecox were lower than those in the other species, while the proportion of Cys was more significant than those in the other species (Figure 3).Comparing the mitogenome of C. praecox with the other eight species showed that the majority of amino acids are similar among all species except for alanine (Ala), cysteine (Cys), proline (Pro), methionine (Met), and Leu.The percentages of Ala, Met, and Pro in C. praecox were lower than those in the other species, while the proportion of Cys was more significant than those in the other species (Figure 3).

Synonymous and Nonsynonymous Substitution Ratio
The ratio of nonsynonymous and synonymous substitutions (Ka/Ks) of the PCGs was calculated to detect the selection pressure among the shared PCGs in the mitogenome of C. praecox and the other species.The Ka/Ks value was determined by analyzing the aligned sequences of the shared PCGs among nine species.The results revealed that the Ka/Ks value for the most shared PCGs was less than 1, which indicated that the synonymous substitution ratio exceeded the nonsynonymous substitution ratio.This result suggested that these genes were likely very conserved in evolution.For instance, only two genes (nad2 and rps7) showed an average Ka/Ks ratio greater than one, suggesting that these genes had positive selection and tended to be retained or fixed during evolution (Figure 4).The remaining genes showed a Ka/Ks ratio of less than one, suggesting that these genes were likely to be highly conserved.For example, all Ka/Ks values in 11 genes (atp1, atp6, ccmC, cox2, cox3, nad3, nad4L, nad5, nad7, nad9, and rps12) were less than one, indicating that these genes were highly conserved and under purifying selection.To evaluate the negative selection of genes, one-way ANOVA was performed using rps7 as the control, in which most values were close to one.In total, eight genes (atp1, cob, cox1, nad1, nad2, nad4, nad5, and nad6) showed statistically significant differences (p-value < 0.5, Supplementary Table S4).
to understand the effect of codon usage.The ENC value of the PCGs ranged from 39.23 to 60.99, indicating a very weak codon usage bias in these genes (Supplementary Table S3).

Synonymous and Nonsynonymous Substitution Ratio
The ratio of nonsynonymous and synonymous substitutions (Ka/Ks) of the PCGs was calculated to detect the selection pressure among the shared PCGs in the mitogenome of C. praecox and the other species.The Ka/Ks value was determined by analyzing the aligned sequences of the shared PCGs among nine species.The results revealed that the Ka/Ks value for the most shared PCGs was less than 1, which indicated that the synonymous substitution ratio exceeded the nonsynonymous substitution ratio.This result suggested that these genes were likely very conserved in evolution.For instance, only two genes (nad2 and rps7) showed an average Ka/Ks ratio greater than one, suggesting that these genes had positive selection and tended to be retained or fixed during evolution (Figure 4).The remaining genes showed a Ka/Ks ratio of less than one, suggesting that these genes were likely to be highly conserved.For example, all Ka/Ks values in 11 genes (atp1, atp6, ccmC, cox2, cox3, nad3, nad4L, nad5, nad7, nad9, and rps12) were less than one, indicating that these genes were highly conserved and under purifying selection.To evaluate the negative selection of genes, one-way ANOVA was performed using rps7 as the control, in which most values were close to one.In total, eight genes (atp1, cob, cox1, nad1, nad2, nad4, nad5, and nad6) showed statistically significant differences (p-value < 0.5, Supplementary Table S4).

Repetitive Sequence Analysis
The variation of mitogenome size in plants could be partly explained by different forms of repeats, such as composed tandem repeats, SSRs, or dispersed repeats that mainly include forward, reverse, complement, and palindromic sequences.The activation and prevalence of repeated sequences in plant mitogenomes play a pivotal role in remodeling the size and structure of plant mitogenomes [60,61].Using the REPuter online program, a total of 2562 dispersed repeat sequences were found in the mitochondrial genome of C. praecox (Figure 5, Supplementary Table S5).The entire length of the dispersed repeat sequences was 165,069 bp, which accounted for 16.98% of the total genome size, and the sequence length ranged from 30 bp to 10,906 bp.The repeat number was greater than that in L. tulipifera (497), O. sativa (238), and Magnolia biondii (1295) but lower than that in N. nucifera (4759) and G. biloba (3529).The repeat length of the majority ranged from 30 bp to 100 bp with few large repeats.For instance, seven species (excluding G. biloba and O. sativa) contained large repeats of more than 10,000 bp, while M. biondii and N. nucifera contained more repeats of less than 200 bp.Notably, O. sativa contained the lowest quantity of repeats (238) but exhibited the most extended repeat size (45,584 bp).All the dispersed repeats in nine species were mainly classified into forward and palindromic types, and the Life 2024, 14, 182 9 of 13 number was similar in each species.The total length of repeats was also variable among species.Specifically, the total repeat length of N. nucifera was 301,185 bp, the largest of the nine species, accounting for 57.39% of the total genome size.A similar pattern was found in G. biloba and O. sativa, accounting for 39.89% and 30.74% of the total genome size, respectively.In the remaining species, the proportion ranged from 8.7% (L.tulipifera) to 17.88% (S. chinensis).
the number was similar in each species.The total length of repeats was also variable among species.Specifically, the total repeat length of N. nucifera was 301,185 bp, the largest of the nine species, accounting for 57.39% of the total genome size.A similar pattern was found in G. biloba and O. sativa, accounting for 39.89% and 30.74% of the total genome size, respectively.In the remaining species, the proportion ranged from 8.7% (L.tulipifera) to 17.88% (S. chinensis).

( 2 ) 14 Figure 1 .
Figure 1.The complete mitogenome of C. praecox.Based on gene functions, the genes are represented as color bars in the circle.Genes represented outside the circle are transcribed counterclockwise, whereas those genes inside are transcribed clockwise, as shown by the gray arrow.The gray bars in the light gray circle indicate the percentage of GC content.

Figure 1 .
Figure 1.The complete mitogenome of C. praecox.Based on gene functions, the genes are represented as color bars in the circle.Genes represented outside the circle are transcribed counterclockwise, whereas those genes inside are transcribed clockwise, as shown by the gray arrow.The gray bars in the light gray circle indicate the percentage of GC content.

Figure 2 .
Figure 2. Codon usage analysis of protein-coding genes (PCGs) in the mitogenome of C. praecox.Relative synonymous codon usage (RSCU) values are plotted on the y-axis.

Figure 2 .
Figure 2. Codon usage analysis of protein-coding genes (PCGs) in the mitogenome of C. praecox.Relative synonymous codon usage (RSCU) values are plotted on the y-axis.

Figure 2 .
Figure 2. Codon usage analysis of protein-coding genes (PCGs) in the mitogenome of C. praecox.Relative synonymous codon usage (RSCU) values are plotted on the y-axis.

Figure 3 .
Figure 3. Relative amino acid percentage of C. praecox compared with the other six species.The y-axis indicates the percentage of amino acids in the CDS in each species.

Figure 4 .
Figure 4. Box plot for pairwise divergence nonsynonymous and synonymous ratio (Ka/Ks) for shared PCGs of the nine mitogenomes.Y-axis indicates the Ka/Ks value, while the x-axis indicates the shared PCGs.Points indicate the outliers.

Figure 4 .
Figure 4. Box plot for pairwise divergence nonsynonymous and synonymous ratio (Ka/Ks) for shared PCGs of the nine mitogenomes.y-axis indicates the Ka/Ks value, while the x-axis indicates the shared PCGs.Points indicate the outliers.

Figure 5 .
Figure 5.The repetitive sequence analysis.The figure on the left shows the distribution of dispersed repeats in nine species, while the figure on the right shows the distribution of SSRs.Different colors in the legend of the left figure represent the repeat size.In contrast, in the right figure, numbers in legend (1-6) correspond to mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide repeats of SSR sequences, respectively.The x-axis indicates the number of dispersed repeats (left) or SSRs (right).

Figure 5 .
Figure 5.The repetitive sequence analysis.The figure on the left shows the distribution of dispersed repeats in nine species, while the figure on the right shows the distribution of SSRs.Different colors in the legend of the left figure represent the repeat size.In contrast, in the right figure, numbers in legend (1-6) correspond to mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide repeats of SSR sequences, respectively.The x-axis indicates the number of dispersed repeats (left) or SSRs (right).

Table 1 .
The gene present in the mitochondrial genome of C. praecox.

Table 1 .
The gene present in the mitochondrial genome of C. praecox.