You are currently viewing a new version of our website. To view the old version click .
Agronomy
  • Article
  • Open Access

28 October 2025

Complete Chloroplast Genome Sequence Structure and Phylogenetic Analysis of Brassica juncea var. multiceps (Brassicaceae)

,
,
,
,
,
,
and
1
Zhejiang Institute of Landscape Plants and Flowers, Zhejiang Academy of Agricultural Sciences, Hangzhou 311251, China
2
Ningbo Academy of Agricultural Sciences, Ningbo 315040, China
3
Key Laboratory of Quality and Safety Control for Subtropical Fruit and Vegetable, Ministry of Agriculture and Rural Affairs, Collaborative Innovation Center for Efficient and Green Production of Agriculture in Mountainous Areas of Zhejiang Province, College of Horticulture Science, Zhejiang A&F University, Hangzhou 311300, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Germplasm Conservation and Genetic Improvement in Tropical and Subtropical Crops

Abstract

Brassica juncea var. multiceps (Xuelihong), a variety of B. juncea (L.) Czern., holds considerable nutritional and economic value. However, its complete chloroplast genome and the evolutionary relationships within Brassicaceae remain poorly characterized. Using Illumina NovaSeq 6000 high-throughput sequencing, we assembled and annotated the full chloroplast genome sequence of B. juncea var. multiceps. The genome is 153,483 bp in length, with 36.36% GC content, and encodes 132 genes. Codon usage analysis identified leucine (Leu) as the dominant amino acid. Thirty-one codons had relative synonymous codon usage (RSCU; a metric for codon preference) values greater than one, with 93.55% of these preferred codons ending in A/U. We detected 37 dispersed repeats (14 forward, 18 palindromic, 3 reverse, and 2 complementary) and 315 simple sequence repeats (SSRs), with mononucleotide SSRs dominating (72.70%). Analysis of the Ka/Ks ratio, a measure of selection pressure (where values greater than one indicate positive selection), indicated that ycf1, ycf2, and nadhF genes may have undergone positive selection. The nucleotide diversity analysis revealed five hypervariable hotspot-genomic regions with high mutation rates, which are critical for phylogenetic studies. Phylogenetic analysis of 26 Brassicaceae species revealed that B. juncea var. multiceps is closely related to B. juncea. Notably, this is the first complete chloroplast genome of B. juncea var. multiceps, with unique hypervariable regions not reported in other B. juncea varieties. These findings clarify evolutionary relationships in Brassicaceae, provide molecular markers for the genetic breeding of B. juncea var. multiceps, and enhance our understanding of chloroplast genome adaptation in Brassica.

1. Introduction

Chloroplasts are essential photosynthetic organelles, critical for plant physiology []. The chloroplast genome exhibits maternal inheritance, structural stability, and moderate sequence evolution traits that make it ideal for resolving plant phylogenetic relationships []. The genome of plant chloroplasts has the structural characteristics of circular double-stranded DNA, with a size spanning from approximately 120 to 160 kb. It breaks down into four main sections: a set of inverted repeat (IRa/IRb) regions, a large single-copy (LSC) region, and a small single-copy (SSC) region []. It contains approximately 110 to 130 genes that are critical for photosynthesis, biosynthesis, and other core metabolic processes []. Breakthroughs in high-throughput sequencing technology have now revolutionized research on plant chloroplast genomes. The National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/; accessed on 15 July 2025) now hosts extensive genomic data. This massive amount of information is a major transformative factor for species classification, evolutionary research, and genetic progress.
Brassicaceae, a flagship angiosperm lineage, comprises approximately 351 genera and 3977 species globally []. This family includes the model plant Arabidopsis thaliana (L.) Heynh. and economically vital Brassica L. crops []. Brassica species have undergone complex genome polyploidization events. The well-known “U-triangle hypothesis” reveals the genetic relationships and evolutionary trajectories among B. rapa L., B. oleracea L., and B. nigra (L.) W.D.J.Koch and their amphidiploid derivatives (B. napus L., B. juncea (L.) Czern., and B. carinata A.Braun) []. As a key component of the “U-triangle hypothesis”, B. juncea has diversified into numerous varieties [].
Brassica juncea var. multiceps Tsen et Lee (Xuelihong) is a variety of B. juncea that is primarily cultivated in the Yangtze River Basin of China []. Rich in vitamins, minerals, and dietary fiber, it is widely consumed either in fresh or pickled form [,,,]. To date, no complete chloroplast genome sequence of B. juncea var. multiceps has been reported. Due to long-term artificial selection and environmental adaptation, B. juncea var. multiceps exhibits significant genetic divergence from other members of the Brassicaceae. Chloroplast genomes provide valuable insights into the evolutionary history of plants. The complete chloroplast genome sequences of several Brassicaceae species, such as A. thaliana [], Raphanus sativus L. [], B. oleracea [], and B. rapa var. purpuraria (L.H.Bailey) Kitam [], have been generated and characterized. Understanding the structural features and evolutionary connections of the B. juncea var. multiceps chloroplast genome is crucial for deciphering the genetic diversity and complex phylogenetic relationships within the Brassicaceae lineage.
For this investigation, we sequenced and assembled the complete chloroplast genome sequence of B. juncea var. multiceps. We systematically analyzed the structural features of this chloroplast genome, including the dynamic variations in gene content, repeat sequence distribution, and boundary dynamics of the IR regions. Using chloroplast genome data, we constructed a phylogenetic tree to resolve the evolutionary position of B. juncea var. multiceps within Brassicaceae. Furthermore, we explored the potential association between chloroplast genomic variation and artificial selection pressure, providing a cytoplasmic genetic perspective to elucidate the molecular mechanisms underlying the adaptive evolution of B. juncea var. multiceps. This study characterizes the evolutionary patterns and functional traits of the B. juncea var. multiceps chloroplast genome, thereby establishing a theoretical foundation for genetic resource conservation, cultivar identification, and chloroplast genome-based molecular breeding of Brassicaceae species.

2. Materials and Methods

2.1. Materials and Sequencing

The B. juncea var. multiceps cultivar “ZJD-24” used in this experiment was grown in the experimental field of the Zhejiang Institute of Landscape Plants and Flowers (30°4′ N, 120°13′ E). In February 2025, leaves were separately collected from three healthy plants and immediately frozen in liquid nitrogen for DNA preservation.
High-quality genomic DNA was obtained using a universal plant DNA extraction kit (Cat. No. D312, Nanjing Genepioneer Biotechnologies Co., Ltd., Nanjing, China) following the manufacturer’s protocol. The quality and concentration of the DNA were verified using three methods: (1) 1% agarose gel electrophoresis (to confirm intact, high-molecular-weight DNA without degradation, visualized under UV light with clear, single bands); (2) a Nanodrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA) (to ensure A260/A280 ratios of 1.8–2.0, indicating low protein contamination, and A260/A230 ratios > 2.0, indicating minimal polysaccharide or salt contamination); and (3) a Qubit 4.0 fluorometer (Thermo Fisher Scientific, USA) (to quantify DNA concentration, ensuring that each sample had ≥500 ng of qualified DNA for sequencing).
Paired-end sequencing (150 bp read length) was performed on the Illumina NovaSeq 6000 platform, which we obtained from Nanjing Genepioneer Biotechnologies Co., Ltd. The average sequencing depth was 6841.6368.

2.2. Chloroplast Genome Assembly and Functional Annotation

Raw sequencing data were filtered using fastp v0.23.4 [], with the following filtering criteria: (1) sequencing adapters and primer sequences were trimmed from the reads; (2) reads with an average quality score below Q5 were removed; and (3) reads containing more than 5 Ns (ambiguous bases) were removed. The chloroplast genome was then assembled with GetOrganelle v1.7.7.1 []. Chloroplast genomes exhibit characteristics such as conservation and rearrangement. To ensure that the assembly results were accurate, we performed quality control on the assembled chloroplast genome in the following three ways: (1) reads were mapped to the genome and information such as genome coverage and insert size were calculated; (2) the genome was aligned with the reference sequence (NC_040849.1) and collinearity analyses were examined (e.g., regarding genome conservation and rearrangement); and (3) the genome was compared with the structural information of the reference sequence, and the differences between the two were analyzed.
For genome annotation, the protein-coding sequences (PCGs) of the chloroplast genome were identified using Prodigal v2.6.3 [], ribosomal RNA (rRNA) genes were predicted via HMMER v3.1b2 [], and transfer RNA (tRNA) genes were annotated using ARAGORN v1.2.38 []. A complete visualization of the chloroplast genome was created using OGDRAW v1.3.1 [], and codon usage bias was calculated using a custom Perl script.

2.3. Analysis of Dispersed Repetitive Sequences and Simple Sequence Repeats

Repetitive sequences were identified using vmatch v2.3.0 combined with Perl scripts, with the following stringent parameters applied: minimum sequence length of 30 bp and a maximum Hamming distance of 3. Simple sequence repeats (SSRs) were detected using MISA v1.0 software [], with parameters set as 1–8 (mononucleotide repeats with ≥8 repeats), 2–5, 3–3, 4–3, 5–3, and 6–3.

2.4. Analysis of Chloroplast Genome Nucleotide Diversity, Ka/Ks, Boundary, and Collinearity

Chloroplast genome sequences of seven Brassicaceae species, including A. thaliana (GenBank accession: KX551970.1), B. carinata (MW628493.1), B. juncea (KT581449.1), B. napus (NC_016734.1), B. nigra (KT878383.1), B. oleracea (MT721156.1), and B. rapa (NC_040849.1), were downloaded from NCBI (accessed on 15 April 2025). The selected species cover key lineages of Brassicaceae, enabling the capture of the family’s hierarchical diversity. A. thaliana, a well-characterized model organism, serves as an outgroup within Brassicaceae to provide a baseline for evolutionary comparisons. The remaining six species belong to the genus Brassica, including all three diploid progenitor species of the “U-triangle hypothesis” (B. rapa, B. oleracea, B. nigra) and their three amphidiploid derivatives (B. juncea, B. napus, B. carinata). The inclusion of these species allows analysis of nucleotide diversity patterns across ploidy levels (diploid vs. amphidiploid) and evolutionary relationships defined by the U-triangle hypothesis, which is critical for clarifying the taxonomic context of B. juncea var. multiceps within its genus. Cross-species alignment of homologous gene sequences was performed using MAFFT v7.427 []. Nucleotide diversity (Pi) values for individual genes were calculated with DnaSP5 [], whereas nonsynonymous/synonymous substitution (Ka/Ks) ratios were computed via KaKs_Calculator v2.0 []. Boundaries of the chloroplast genome were visualized using the CPJSdraw v1.0.0. Whole-genome alignments were conducted with Mauve v2.3.1 [].

2.5. Phylogenetic Analysis

Chloroplast genomes of 26 Brassicaceae taxa—including core crop lineages (e.g., B. rapa and B. oleracea) and representative non-crop groups (e.g., R. sativus and Eruca vesicaria (L.) Cav. subsp. sativa (Mill.) Thell.)—were retrieved from the NCBI database (accessed on 22 April 2025). Since the APG IV system [] explicitly delineates Brassicaceae (the mustard family) and Caricaceae (the family containing Carica papaya L.) as two independent, mutually exclusive monophyletic families within the order Brassicales, this phylogenetic relationship, characterized by “close affinity yet distinct taxonomic boundaries”, makes C. papaya an ideal outgroup. Furthermore, Zhang et al. [] and Zhou et al. [] also employed C. papaya as the outgroup in their respective phylogenetic analyses of Brassicaceae species, and therefore C. papaya was selected as the outgroup in this study. Interspecific multiple-sequence alignment was performed using MAFFT v7.427 (--auto mode) []. The maximum likelihood (ML) phylogenetic tree was constructed using RAxML v8.2.10 [], with the GTRGAMMA model selected. Rapid Bootstrap analysis was conducted, and the bootstrap set to 1000 replicates. Additionally, we constructed a phylogenetic tree using Bayesian inference (BI). Analyses were conducted in MrBayes v3.2.7 [], where the Markov Chain Monte Carlo (MCMC) algorithm was utilized with the following parameters: the MCMC run was set to 1,000,000 generations, with sampling conducted every 100 generations. Upon run completion, the first 25% of generated trees were discarded as burn-in, and a majority-rule consensus tree was ultimately constructed from the remaining sampled trees.

3. Results

3.1. Genomic Architecture of B. juncea var. multiceps Chloroplasts

The chloroplast genome of B. juncea var. multiceps (PX240752.1) exhibits the typical quadripartite structure of angiosperm chloroplasts, with a total length of 153,483 bp (Figure 1, Table 1). This structure comprises an LSC region (83,286 bp), an SSC region (17,775 bp), and a pair of identical IRs (IRa and IRb, each 26,211 bp). The nucleotide composition analysis revealed the following base frequency values: adenine (A) = 31.35%, cytosine (C) = 18.51%, guanine (G) = 17.85%, and thymine (T) = 32.39%. The chloroplast genome’s overall GC content was 36.36%. Notably, the IR regions had a significantly higher GC content (42.34%) than the LSC (34.12%) and SSC (29.20%) regions (Table 1). This GC enrichment in IRs is consistent with the presence of rRNA genes in these regions; rRNA genes require a high GC content to maintain stable secondary structures, which is critical for chloroplast ribosome assembly and protein translation.
Figure 1. Circle diagram of the Brassica juncea var. multiceps chloroplast genome. The outer circle illustrates genes that are expressed in the forward direction, while the inner circle reveals those encoded in reverse. Various functional categories are denoted by different colored strips. Within the inner, dark gray circle, we see the distribution of the CG content. The quadrants—LSC, SSC, IRa, and IRb—are clearly annotated in the inner circle. The light gray circular graph displays the AT content percentage.
Table 1. Chloroplast genome characteristics of B. juncea var. multiceps.

3.2. Functional Annotation of Chloroplast Genes in B. juncea var. multiceps

The chloroplast genome of B. juncea var. multiceps contained 132 annotated genes: 87 PCGs, 37 tRNA genes, and 8 rRNA genes. These genes primarily function in photosynthesis and chloroplast self-replication (Table S1). Within this genome, 72 PCGs and 23 tRNA genes exist as single-copy sequences. Additionally, eight protein-coding genes, six tRNA genes, and four rRNA genes are present in duplicate. Notably, the trnE-UUC gene occurs in three copies, while trnM-CAU is found in four copies. Among genes containing introns, eleven PCGs and eight tRNA genes each harbor a single intron, whereas four PCGs contain two introns per gene (Table S1).

3.3. Codon Preference Analysis

A comprehensive analysis of the B. juncea var. multiceps chloroplast genome revealed that 22,773 codons were used for amino acid translation. Among these, leucine (Leu) codons were the most abundant, totaling 2417 instances. Codons for isoleucine (Ile) and serine (Ser) followed, with 1976 and 1694, respectively (Table S2). The relative synonymous codon usage (RSCU) analysis showed 29 of 31 codons with RSCU > 1 end in A/U and 30 of 33 codons with RSCU < 1 end in G/C. This A/U preference is a hallmark of chloroplast genomes. The RSCU analysis revealed distinct patterns. The AUG codon acts as the primary translation initiation signal; it showed the highest RSCU value of 6.9594. This overrepresentation stems from the genomic architecture, as every protein requires a start codon. In contrast, GUG is a rare alternative start codon for methionine; it displayed the lowest RSCU value of 0.0406. This low frequency suggests poor recognition by the chloroplast translation initiation machinery. As expected, UGG, the sole codon for tryptophan, had an RSCU of 1.0. This result aligns with the universal genetic code, where tryptophan has no synonymous codons (Figure 2).
Figure 2. Examination of relative synonymous codon usage (RSCU) within the chloroplast genome of B. juncea var. multiceps. Squares depict codons for corresponding amino acids.

3.4. Analysis of Repetitive Sequences

In the chloroplast genome of B. juncea var. multiceps, 37 interspersed repetitive sequences were identified, including 14 forward (F) repeats, 18 palindromic (P) repeats, 3 reverse (R) repeats, and 2 complementary (C) repeats (Figure 3). Among these repetitive sequences, those that were 30 bp in length were the most abundant, with 11 occurrences in total. Sequences of 32 bp were the second most frequent, accounting for five occurrences. The longest identified repetitive sequence was 26,211 bp. This represented a single copy of the IR region, consistent with the chloroplast’s quadripartite structure. The lengths of the remaining repetitive sequences ranged from 30 bp to 58 bp (Figure 3). These were distributed across noncoding regions of the LSC and SSC. Their presence suggests that they potentially play roles in mediating genome recombination or regulating gene expression.
Figure 3. Statistical chart of interspersed repeat sequences in the B. juncea var. multiceps chloroplast genome. F = direct repeats; P = palindromes; R = inverted repeats; C = complementary repeats.
SSRs are short tandem repeats consisting of 1–6 nucleotide units. A total of 315 SSRs were identified in the B. juncea var. multiceps chloroplast genome, with 197 located in the LSC region, 72 in the SSC region, and 46 in the IR region. The total length of the SSRs was 2956 bp, dominated by mononucleotide repeats (229, mostly A/T-rich), followed by trinucleotide (63), dinucleotide (17), and tetranucleotide (4) repeats. Among the 315 SSRs, the three most abundant repeat motifs were T(8) (15.56%), A(8) (13.65%), and T(9) (13.33%) (Figure 4, Table S3). This A/T bias is typical of chloroplast SSRs.
Figure 4. Statistical analysis of the number of each SSR type in the B. juncea var. multiceps chloroplast genome.

3.5. Selection Pressure Analysis

The comparative genomic analysis of B. juncea var. multiceps and seven other Brassicaceae species revealed a mean Ka/Ks ratio of 0.14 (Table S4). This low value indicates strong purifying selection on most chloroplast genes. Exceptionally, several genes showed Ka/Ks > 1 (indicative of positive selection): ycf2 (Ka/Ks = 1.16 vs. B. nigra) and nadhF (Ka/Ks = 1.29 vs. B. oleracea), as well as ycf1 (1.51) and ycf2 (1.16) vs. B. carinata. Positive selection on these genes may reflect adaptive evolution.

3.6. Comparison of Chloroplast Genome Sequences

Analysis of chloroplast genome nucleotide diversity (Pi) across eight Brassicaceae species showed a mean Pi value of 0.0075 (Figure 5, Table S4). Nucleotide diversity varied markedly across different regions of the chloroplast genome. The LSC region exhibited Pi values ranging from 0 to 0.0359, with a mean of 0.0074, while the SSC region showed values from 0 to 0.0341 and a mean of 0.0138. In stark contrast, the IR regions displayed significantly lower diversity, with Pi values ranging from 0 to only 0.0081 and a mean of 0.0022 (Figure 5). Nucleotide diversity analysis identified five highly variable regions: three in the LSC region (rps16, matK, and rpl22) and two in the SSC region (ycf1 and ccsA). Among these, the rps16 gene exhibited the highest genetic diversity.
Figure 5. Nucleotide diversity (Pi) analysis of eight chloroplast genomes.
In chloroplast genome evolution, IR boundary shifts primarily drive the variations in the size of the plant species. The analysis of chloroplast genomic boundaries in eight Brassicaceae species revealed that their chloroplast genome sizes ranged from 152,869 bp in B. napus to 154,515 bp in A. thaliana. Similarly, the lengths of the IR region varied from 26,035 bp in B. napus to 26,261 bp in A. thaliana (Figure 6). All eight examined Brassicaceae species shared four conserved chloroplast genome junctions: JLB (LSC/IRb), JSB (IRb/SSC), JSA (SSC/IRa), and JLA (IRa/LSC). In all eight species, the JLB junction was located within the rps19 coding sequence, with 165–167 bp of rps19 residing in the LSC region and 112–114 bp in the IRb region. The JSB junction region was located within the coding regions of ycf1 and ndhF genes, with most of ycf1 in the IRb region (only 2–3 bp in the SSC region) and most of ndhF in the SSC region (only 36–37 bp in the IRb region), and there was a 39 bp overlap between ycf1 and ndhF. The JSA junction was located within the ycf1 gene, with 4271–4358 bp of ycf1 in the SSC region and 1027–1030 bp in the IRa region. In all eight species, trnH was located in the LSC region, 2–3 bp away from the JLA boundary (Figure 6). These findings indicated that, despite minor IR expansion and contraction, the overall variation in the IR regions of these Brassicaceae species was minimal.
Figure 6. IR/SC boundary analysis of eight Brassicaceae crops. Fine lines indicate regional junctions, with nearby gene data displayed in the figure.
A comparative analysis of chloroplast gene organization and gene arrangement order across the eight studied Brassicaceae species showed that their gene structures were highly conserved, with no obvious gene rearrangement or inversion (Figure 7). This finding indicates that the chloroplast genomes of Brassicaceae species maintain a high degree of constancy throughout evolutionary processes.
Figure 7. Analysis of chloroplast sequence homology among eight Brassicaceae plants. The illustration employs hues to represent genomic loci: CDS (white), tRNA (green), and rRNA (red). The shaded local collinear blocks highlight regions with homologous alignment to another genome. If a block appears above the central axis, it signifies forward alignment relative to the reference genome’s sequence, while those below the axis represent reverse-complement alignment. Unshaded areas indicate no detectable homology between the compared genomes. Mauve generates a similarity profile within each aligned block, where the profile’s peak height reflects the average sequence conservation in that region. Connecting lines between the colored blocks illustrate collinear relationships, showing conserved synteny across the genomes.

3.7. Phylogenetic Relationship Analysis

Chloroplast genomes of 26 Brassicaceae species and the outgroup C. papaya were retrieved from the GenBank database of NCBI. To ensure robust support for subsequent analyses, all selected chloroplast genomes met strict quality criteria: (1) they were complete and circular chloroplast genome sequences; (2) all PCGs, rRNA genes, and tRNA genes were fully annotated; and (3), to ensure traceability, detailed accession numbers of these genomes are provided in Figure 8. This rigorous data selection process ensures that the genomic data used in this study provide reliable support for phylogenetic reconstruction and other downstream analyses.
Figure 8. Phylogenetic analysis derived from chloroplast genome sequences. (A) The phylogenetic tree was constructed via the maximum likelihood (ML) method with 1000 bootstrap replicates, and the values on each branch of the tree represent the bootstrap support values. (B) The phylogenetic tree was constructed using Bayesian inference (BI). The values labeled on each branch of the tree (B) represent posterior probabilities (PPs), which are used to quantify the confidence level of the evolutionary relationship for that branch.
Phylogenetic trees were constructed using two methods: BI and ML (Figure 7). The results showed that the trees generated via these two approaches displayed consistent topologies with high support values, with only minor discrepancies observed at select nodes. In the ML tree, B. juncea and B. juncea var. multiceps formed a sister group, and the node support value between this sister clade and B. rapa was 99%; in contrast, the corresponding node support in the BI tree reached 100%. We revealed that within the genus Brassica, B. juncea and B. juncea var. multiceps shared the closest evolutionary affinity, followed sequentially by B. rapa, B. oleracea, and B. napus. Among taxa outside the genus Brassica, R. sativus (genus Raphanus) showed the closest evolutionary affinity to B. juncea var. multiceps, followed by E. vesicaria subsp. sativa (genus Eruca). As expected for an outgroup, C. papaya displayed the farthest evolutionary distance from B. juncea var. multiceps. Notably, the phylogenetic topology recovered the genus Brassica as non-monophyletic: non-Brassica genera (e.g., Raphanus, Eruca) were nested within the broader clade that included all sampled Brassica species, rather than forming a distinct monophyletic group exclusive to Brassica.

4. Discussion

The chloroplast genome is a key carrier of cytoplasmic inheritance in plants. Its structural characteristics and evolutionary patterns are crucial to revealing the phylogenetic relationships among species. These characteristics play a pivotal role in clarifying the adaptive evolution mechanism of species, and they hold great significance for exploring the potential of genetic improvement in species []. In this research, the B. juncea var. multiceps chloroplast genome was sequenced and inspected to reveal its genomic structural characteristics, codon usage patterns, repetitive sequence distribution, and phylogenetic relationships with other species in the Brassicaceae lineage.
The complete chloroplast genome of B. juncea var. multiceps is 153,483 bp in length, with a GC content of 36.36%. It has a quadripartite structure, typical of most Brassicaceae species, confirming that the chloroplast genomic structure of Brassicaceae species is highly conserved []. The GC content in the chloroplast genome is a key feature that affects species distribution and environmental adaptability []. In the chloroplast genome of B. juncea var. multiceps, the IR regions have a higher GC content (42.34%) than the LSC (34.12%) and SSC (29.20%) regions. This is also observed in B. juncea [], B. oleracea [], R. sativus [], and A. thaliana [], reflecting different functional requirements of these genomic regions. In terms of gene composition, the chloroplast genome of B. juncea var. multiceps contains 132 annotated genes: 87 PCGs, 37 tRNA genes, and 8 rRNA genes. This gene type and count are nearly identical to those of A. thaliana and other Brassicaceae []. This finding suggests that the essential genes within the chloroplast genome remain remarkably stable throughout evolution.
Codon usage bias exerts a significant influence on chloroplast genome evolution []. In the B. juncea var. multiceps chloroplast genome, favored codons (RSCU > 1; 93.55%) primarily end in A/U, while disfavored codons (RSCU < 1; 90.91%) mostly end in G/C. This phenomenon is common in angiosperms, such as Morina chinensis [] and Cardiospermum halicacabum [], making it a widespread plant chloroplast trait. This A/U bias boosts translation efficiency; it helps chloroplast proteins to function stably under environmental fluctuations. Such conservation in codon use also highlights long-term evolutionary constraints on chloroplast translation machinery. Plant chloroplast genomes usually contain abundant SSRs and interspersed repetitive sequences, which are well-documented mutation hotspots in genome sequences []. A total of 315 SSRs were detected in the B. juncea var. multiceps chloroplast, with a strong bias toward A and T as repeat units. Mononucleotide repeats accounted for the highest proportion (72.70%), followed by trinucleotide repeats (20.00%), while tetranucleotide repeats were the least abundant (1.90%). Additionally, 37 long repetitive sequences were identified, with 97.30% spanning 30–58 bp in length. These SSRs are ideal for developing molecular markers. They can distinguish B. juncea varieties and support marker-assisted breeding. The short, interspersed repeats may also drive genome recombination, creating a genetic variation that is useful in terms of improving stress tolerance in B. juncea crops.
The Ka/Ks ratio serves as a key metric in inferring gene evolution and can be used to assess the selection pressure of PCGs []. Using B. juncea var. multiceps as a reference, we conducted a Ka/Ks ratio analysis across seven chloroplast genomes. The findings of this analysis showed that most chloroplast genes experience strong purifying selection. In contrast, ycf1, ycf2, and nadhF are notable exceptions, with the Ka/Ks ratios exceeding 1. This suggests that these genes have undergone positive selection, potentially leading to functional divergence. The nadhF gene encodes a subunit of NADH dehydrogenase, an enzyme primarily involved in facilitating electron transport and photosynthetic energy conversion []. Its positive selection may improve photosynthetic efficiency, helping B. juncea var. multiceps adapt to the Yangtze River Basin’s climate. Future gene expression analyses and mutation studies could help determine whether nadhF contributes to chloroplast adaptation and genomic stability in Brassica crops []. Nucleotide diversity (Pi) reveals that there is a genetic variation between species. Highly variable regions are useful for population-level molecular markers []. The analysis of chloroplast genome variation in B. juncea var. multiceps revealed unique patterns of nucleotide diversity in different regions. In B. juncea var. multiceps, Pi is higher in LSC (0.0074) and SSC (0.0138) regions than in IRs (0.0022). This is because IR gene duplication buffers mutations, keeping IRs more conserved. Five high-variation sites were identified through nucleotide diversity, namely matK (0.02211), rpl22 (0.02174), ccsA (0.02327), ycf1 (0.03406), and rps16 (0.03588). These research findings laid the foundation for DNA barcode creation, species classification, and evolutionary tree analysis. During evolution, most terrestrial plants have relatively conserved chloroplast genomes, but it is normal that the size and location of inverted repeat sequences and single-copy region boundaries may change slightly [,]. In this study, we found no evidence of significant gene rearrangements or inversions in the chloroplast genes of eight Brassicaceae species, including B. juncea var. multiceps. While minor expansion and contraction of genome boundaries were observed, the overall variation range was small. The IR regions demonstrate significantly higher structural stability compared with single-copy regions. These findings reinforce the importance of the IR region in chloroplast genome stability [].
To clarify the evolutionary position of B. juncea var. multiceps, we built phylogenetic trees using ML and BI methods with 26 Brassicaceae chloroplast genomes. The two methods gave consistent, well-supported topologies. B. juncea var. multiceps was nested within a clade characterized by highly supported nodes, and this clade comprises two distinct subgroups: (1) core Brassica crops with A/C genomes: B. juncea (AABB), B. rapa (AA), B. oleracea (CC), and B. napus (AACC); (2) non-Brassica genera: R. sativus and E. vesicaria subsp. sativa. This clustering pattern indicates that B. juncea var. multiceps shares a close evolutionary relationship with these A/C genome-related Brassica species, consistent with its taxonomic classification as a variety of B. juncea []. It also confirms its maternal origin from a B. rapa-like ancestor []. Additionally, two Brassica species with B genome backgrounds—B. nigra (BB) and B. carinata (BBCC allotetraploid)—form a separate clade with 100% bootstrap support, which also includes non-Brassica genera: Mustarda arvensis L., Sinapis alba L., and Crambe hispanica L. subsp. abyssinica (Hochst. ex A.Rich.) Prantl. These results demonstrate that Brassica sensu stricto is non-monophyletic: the two Brassica subclades do not form a single exclusive clade that includes all sampled Brassica species. Instead, the B genome subclade shares a more recent common ancestor with non-Brassica genera than with the A/C genome subclade, failing to meet the criterion of monophyly. Furthermore, non-Brassica genera are not resolved as a separate outgroup but are instead interspersed among Brassica subclades, providing additional support for the non-monophyly of Brassica s.s. This non-monophyly observed in this study is consistent with various previous investigations. Li et al. [] conducted evolutionary analyses using chloroplast genomes from 60 accessions representing six Brassica species, revealing significantly greater genetic distances between the chloroplast genomes of B. nigra and B. carinata and those of B. oleracea, B. napus, B. juncea, and B. rapa compared to the genetic distances observed among these latter four species. Wang et al. [] analyzed B. oleracea L. var. alboglabra (L.H. Bailey) Musil and related Brassicaceae species using chloroplast genomes; it was found that B. nigra, B. carinata, Sinapis arvensis L. (a non-Brassica genus in Brassicaceae), and S. alba clustered together, rather than grouping with other Brassica species. Zhang et al. [] analyzed B. oleracea L. var. italica Plenck and related Brassicaceae species using chloroplast genomes and observed that B. nigra clustered with S. arvensis instead of other Brassica species. Collectively, these studies validate the hypothesis proposed by Pradhan et al. [] that crops in the tribe Brassiceae can be divided into two deeply divergent lineages: the Brassica lineage (including B. rapa, B. oleracea, B. juncea, B. napus, and R. sativus) and the Sinapis lineage (including B. nigra, B. carinata, and S. arvensis). Our results further clarify that B. juncea var. multiceps belongs to the Brassica lineage, thereby resolving its evolutionary position at a finer scale relative to both core Brassica crops and non-Brassica taxa in Brassicaceae.

5. Conclusions

This research showed that the chloroplast genome of B. juncea var. multice presents a typical quadripartite structure, with a total length of 153,483 bp. Comprehensive annotation identified 132 genes, primarily related to photosynthesis and chloroplast self-replication. Among all codons, 31 codons exhibited RSCU > 1, and 93.55% of these preferred codons terminated in A/U. Additionally, 37 interspersed repetitive sequences and 315 SSRs were identified. During evolution, the chloroplast genomes of Brassicaceae crops, including B. juncea var. multiceps, mainly underwent purifying selection and remained relatively conserved. Phylogenetic analysis showed that B. juncea has the closest genetic relationship with B. juncea var. multiceps. This research provides valuable insights into the evolutionary connections within B. juncea var. multiceps, clarifying its genetic lineage. Furthermore, it establishes a foundation for optimizing breeding strategies not only for this particular variety but also for related Brassicaceae species, thereby facilitating more targeted and efficient genetic improvement efforts.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agronomy15112501/s1. Table S1: Gene annotation of the Brassica juncea var. multiceps chloroplast genome; Table S2: Analysis of relative synonymous codon usage (RSCU) in the B. juncea var. multiceps chloroplast genome; Table S3: The analysis of the number of each SSR type in the B. juncea var. multiceps chloroplast genome; Table S4: The Ka/Ks analysis of genes in the chloroplast genome; Table S5: Statistical analysis of Pi values in chloroplast genomes.

Author Contributions

Conceptualization, T.L.; methodology, Z.H.; software, L.X.; validation, X.L. and L.Z.; formal analysis, S.L.; investigation, C.C.; resources, T.L.; data curation, T.L.; writing—original draft preparation, T.L.; writing—review and editing, X.A.; visualization, Z.H.; supervision, X.A.; project administration, T.L.; funding acquisition, T.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant numbers 32202506 and 32202508.

Data Availability Statement

The data presented in this study are openly available from the NCBI at https://www.ncbi.nlm.nih.gov/nuccore/PX240752.1 (accessed on 2 September 2025), reference number PX240752.1.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BIBayesian inference
CComplement
FForward
IRInverted repeat
LDLinear dichroism
LSCLarge single-copy
MCMCMarkov Chain Monte Carlo
MLMaximum likelihood
PPalindromic
PPsPosterior probabilities
RReverse
RSCURelative synonymous codon usage
SSCSmall single-copy
SSRsSimple sequence repeats

References

  1. Daniell, H.; Lin, C.S.; Yu, M.; Chang, W.J. Chloroplast genomes: Diversity, evolution, and applications in genetic engineering. Genome Biol. 2016, 17, 134. [Google Scholar] [CrossRef]
  2. de Vere, N.; Rich, T.C.G.; Trinder, S.A.; Long, C. DNA barcoding for plants. Methods Mol. Biol. 2015, 1245, 101–118. [Google Scholar] [PubMed]
  3. Wicke, S.; Schneeweiss, G.M.; dePamphilis, C.W.; Müller, K.F.; Quandt, D. The evolution of the plastid chromosome in land plants: Gene content, gene order, gene function. Plant Mol. Biol. 2011, 76, 273–297. [Google Scholar] [CrossRef]
  4. Bock, R. Structure, function, and inheritance of plastid genomes. Top. Curr. Genet. 2007, 19, 29–63. [Google Scholar]
  5. Liu, L.M.; Du, X.Y.; Guo, C.; Li, D.Z. Resolving robust phylogenetic relationships of core Brassicaceae using genome skimming data. J. Syst. Evol. 2020, 59, 442–453. [Google Scholar] [CrossRef]
  6. Yang, J.; Liu, D.; Wang, X.; Ji, C.; Cheng, F.; Liu, B.; Hu, Z.; Chen, S.; Pental, D.; Ju, Y.; et al. The genome sequence of allopolyploid Brassica juncea and analysis of differential homoeolog gene expression influencing selection. Nat. Genet. 2016, 48, 1225–1232. [Google Scholar] [CrossRef]
  7. U, N. Genome analysis in Brassica with special reference to the experimental formation of B. Napus and peculiar mode of fertilization. J. Jpn. Bot. 1935, 7, 389–452. [Google Scholar]
  8. Kang, L.; Qian, L.; Zheng, M.; Chen, L.; Chen, H.; Yang, L.; You, L.; Yang, B.; Yan, M.; Gu, Y.; et al. Genomic insights into the origin, domestication and diversification of Brassica juncea. Nat. Genet. 2021, 53, 1392–1402. [Google Scholar] [CrossRef]
  9. Zhong, X.F.; Hu, Y.X.; Liu, D.H.; Chen, J.C.; Ye, X.Q. Changes of phenolic acids and antioxidant activities during potherb mustard (Brassica juncea, Coss.) pickling. Food Chem. 2008, 108, 811–817. [Google Scholar] [CrossRef]
  10. Liu, D.; Zhang, C.; Zhang, J.; Xin, X.; Liao, X. Metagenomics reveals the formation mechanism of flavor metabolites during the spontaneous fermentation of potherb mustard (Brassica juncea var. multiceps). Food Res. Int. 2021, 148, 110622. [Google Scholar] [CrossRef]
  11. Zhang, C.C.; Chen, J.B.; Li, X.Q.; Liu, D.Q. Bacterial community and quality characteristics of the fermented potherb mustard (Brassica juncea var. multiceps) under modified atmosphere. Food Res. Int. 2018, 116, 266–275. [Google Scholar] [CrossRef] [PubMed]
  12. Zhao, D.; Tang, J.; Ding, X. Analysis of volatile components during potherb mustard (Brassica juncea, Coss.) pickle fermentation using SPME-GC-MS. LWT-Food Sci. Technol. 2007, 40, 439–447. [Google Scholar] [CrossRef]
  13. Kong, X.; Zhang, J.; Shen, H.; Shi, N.; Zhou, H.; Yi Li, Y.; Guo, Y.; Luo, H.; Yu, L. Screening, identification, and fermentation characteristics of lactic acid bacteria from pickled potherb mustard and potential applications. Foods 2025, 14, 1431. [Google Scholar] [CrossRef]
  14. Sato, S.; Nakamura, Y.; Kaneko, T.; Asamizu, E.; Tabata, S. Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 1999, 6, 283–290. [Google Scholar] [CrossRef] [PubMed]
  15. Jenog, Y.M.; Chung, W.H.; Mun, J.H.; Kim, N.; Yu, H.J. De novo assembly and characterization of the complete chloroplast genome of radish (Raphanus sativus L.). Gene 2014, 551, 39–48. [Google Scholar] [CrossRef]
  16. Seol, Y.J.; Kim, K.; Kang, S.H.; Perumal, S.; Lee, J.; Kim, C.K. The complete chloroplast genome of two Brassica species, Brassica nigra and B. oleracea. Mitochondrial DNA A 2015, 28, 167–168. [Google Scholar] [CrossRef]
  17. Zhou, X.; Ren, H.; Zhang, J.; Xu, D.; Xiao, W.; Huang, H.; Li, G.; Zhang, H.; Zheng, Y. The complete chloroplast genome of Brassica rapa var. purpuraria (L.H.Bailey) Kitam 1950 and its phylogenetic analysis. Mitochondrial DNA B 2024, 9, 143–147. [Google Scholar] [CrossRef] [PubMed]
  18. Chen, S.; Zhou, Y.; Chen, Y.; Jia, G. Fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 2018, 34, 884–890. [Google Scholar] [CrossRef]
  19. Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A.A.; Dvorkin, M.; Kulikov, A.S.; Lesin, V.M.; Nikolenko, S.I.; Pham, S.; Prjibelski, A.D.; et al. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 2012, 19, 455–477. [Google Scholar] [CrossRef]
  20. Hyatt, D.; Chen, G.L.; LoCascio, P.; Land, M.; Larimer, F.; Hauser, L. Prodigal: Prokaryotic gene recognition and translation initiation site identification. BMC Bioinform. 2010, 11, 119. [Google Scholar] [CrossRef]
  21. Mistry, J.; Finn, R.D.; Eddy, S.R.; Bateman, A.; Punta, M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 2013, 41, e121. [Google Scholar] [CrossRef]
  22. Laslett, D.; Canback, B. ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids Res. 2004, 32, 11–16. [Google Scholar] [CrossRef] [PubMed]
  23. Greiner, S.; Lehwark, P.; Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1, expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res. 2019, 47, 59–64. [Google Scholar] [CrossRef]
  24. Beier, S.; Thiel, T.; Münch, T.; Scholz, U.; Mascher, M. MISA-web: A web server for microsatellite prediction. Bioinformatics 2017, 33, 2583–2585. [Google Scholar] [CrossRef]
  25. Nakamura, T.; Yamada, K.D.; Tomii, K.; Katoh, K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics 2018, 34, 2490–2492. [Google Scholar] [CrossRef] [PubMed]
  26. Librado, P.; Rozas, J. DnaSP v5, a software for comprehensive analysis of DNA polymorphism data. Bioinformatics 2009, 25, 1451–1452. [Google Scholar] [CrossRef]
  27. Wang, D.; Zhang, Y.; Zhang, Z.; Zhu, Z.; Yu, J. KaKs_Calculator 2.0, a toolkit incorporating gamma-series methods and sliding window strategies. Genom. Proteom. Bioinf. 2010, 8, 77–80. [Google Scholar] [CrossRef]
  28. Darling, A.C.E.; Mau, B.; Blattner, F.R.; Perna, N.T. Mauve: Multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004, 14, 1394–1403. [Google Scholar] [CrossRef] [PubMed]
  29. Chase, M.W.; Christenhusz, M.J.M.; Fay, M.F.; Byng, J.W.; Judd, W.S.; Soltis, D.E.; Mabberley, D.J.; Sennikov, A.N.; Soltis, P.S.; Stevens, P.F. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG IV. Bot. J. Linn. Soc. 2016, 181, 1–20. [Google Scholar] [CrossRef]
  30. Zhang, Z.; Tao, M.; Shan, X.; Pan, Y.; Sun, C.; Song, L.; Pei, X.; Jing, Z.; Dai, Z. Characterization of the complete chloroplast genome of Brassica oleracea var. italica and phylogenetic relationships in Brassicaceae. PLoS ONE 2022, 17, e0263310. [Google Scholar] [CrossRef]
  31. Silvestro, D.; Michalak, I. raxmlGUI: A graphical front-end for RAxML. Org. Divers. Evol. 2012, 12, 335–337. [Google Scholar] [CrossRef]
  32. Huelsenbeck, J.P. MrBayes 3.2, efficient bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 2012, 61, 539–542. [Google Scholar]
  33. Xu, J.; Liu, Q.; Hu, W.; Wang, T.; Xue, Q.; Messing, J. Dynamics of chloroplast genomes in green plants. Genomics 2015, 106, 221–231. [Google Scholar] [CrossRef]
  34. Prabhudas, S.K.; Raju, B.; Kannan, T.S.; Parani, M.; Natarajan, P. The complete chloroplast genome sequence of Indian mustard (Brassica juncea L.). Mitochondrial DNA A 2015, 27, 4622–4623. [Google Scholar] [CrossRef]
  35. Liu, F.; Chen, N.; Wang, H.; Li, J.; Wang, J.; Qu, F. Novel insights into chloroplast genome evolution in the green macroalgal genus Ulva (Ulvophyceae, Chlorophyta). Front. Plant Sci. 2023, 14, 1126175. [Google Scholar] [CrossRef] [PubMed]
  36. Zhao, M.; Wu, Y.; Ren, Y. Complete chloroplast genome sequence structure and phylogenetic analysis of kohlrabi (Brassica oleracea var. gongylodes L.). Genes 2024, 15, 550. [Google Scholar] [CrossRef] [PubMed]
  37. Liu, P.H.; Yuan, Q.; Liu, H.; Qin, L.L.; Wei, Y.; Xu Min Li, X.M.; Ren, F.; Ma, X.L.; Liu, H.R. Comprehensive analysis of complete chloroplast genome sequence of Morina L. Sci. Rep. 2025, 15, 14858. [Google Scholar] [CrossRef] [PubMed]
  38. Su, Y.; Wei, W.; Han, L.; Wen, H.; Lu, H. Comprehensive analysis of complete Chloroplast genome sequence of Cardiospermum halicacabum L. (Sapindaceae). Sci. Rep. 2025, 15, 22457. [Google Scholar] [CrossRef]
  39. Park, I.; Yang, S.; Choi, G.; Kim, W.J.; Moon, B.C. The complete chloroplast genome sequences of Aconitum pseudolaeve and Aconitum longecassidatum, and development of molecular markers for distinguishing species in the Aconitum Subgenus Lycoctonum. Molecules 2017, 22, 2012. [Google Scholar] [CrossRef]
  40. Yang, Z.; Nielsen, R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 2000, 17, 32–43. [Google Scholar] [CrossRef]
  41. Peng, L.; Yamamoto, H.; Shikanai, T. Structure and biogenesis of the chloroplast NAD(P)H dehydrogenase complex. BBA-Bioenergetics 2010, 1807, 945–953. [Google Scholar] [CrossRef] [PubMed]
  42. Lubna; Jan, R.; Hashmi, S.S.; Asif, S.; Bilal, S.; Waqas, M.; Abdelbacki, A.M.M.; Kim, K.M.; Harrasi, A.A.; Asaf, S. The first complete chloroplast genome of spider flower (Cleome houtteana) providing a genetic resource for understanding cleomaceae evolution. Int. J. Mol. Sci. 2025, 26, 3527. [Google Scholar] [CrossRef]
  43. Loeuille, B.; Thode, V.; Siniscalchi, C.; Andrade, S.; Rossi, M.; Pirani, J.R. Extremely low nucleotide diversity among thirty-six new chloroplast genome sequences from Aldama (Heliantheae, Asteraceae) and comparative chloroplast genomics analyses with closely related genera. PeerJ 2021, 9, e10886. [Google Scholar] [CrossRef]
  44. Zhang, R.; Zhang, L.; Wang, W.; Zhang, Z.; Du, H.; Qu, Z.; Li, X.; Xiang, H. Differences in codon usage bias between photosynthesis-related genes and genetic system-related genes of chloroplast genomes in cultivated and wild solanum species. Int. J. Mol. Sci. 2018, 19, 3142. [Google Scholar] [CrossRef]
  45. Dong, F.; Lin, Z.; Lin, J.; Ming, R.; Zhang, W. Chloroplast genome of rambutan and comparative analyses in Sapindaceae. Plants 2021, 10, 283. [Google Scholar] [CrossRef]
  46. Wang, H.X.; Liu, H.; Moore, M.J.; Landrein, S.; Liu, B.; Zhu, Z.X.; Wang, H.F. Plastid phylogenomic insights into the evolution of the Caprifoliaceae, S.L. (Dipsacales). Mol. Phylogenet. Evol. 2020, 142, 10664. [Google Scholar] [CrossRef]
  47. Li, P.; Zhang, S.; Li, F.; Zhang, S.; Zhang, H.; Wang, X.; Sun, R.; Bonnema, G.; Borm, T.J.A. A phylogenetic analysis of chloroplast genomes elucidates the relationships of the six economically important Brassica species comprising the triangle of U. Front. Plant Sci. 2017, 8, 111. [Google Scholar] [CrossRef] [PubMed]
  48. Wang, Y.; Liang, Q.; Zhang, C.; Huang, H.; He, H.; Wang, M.; Li, M.; Huang, Z.; Tang, Y.; Chen, Q.; et al. Sequencing and analysis of complete chloroplast genomes provide insight into the evolution and phylogeny of Chinese kale (Brassica oleracea var. alboglabra). Int. J. Mol. Sci. 2023, 24, 10287. [Google Scholar] [CrossRef]
  49. Pradhan, A.K.; Prakash, S.; Mukhopadhyay, A.; Pental, D. Phytogeny of Brassica and allied genera based on variation in chloroplast and mitochondrial DNA patterns: Molecular and taxonomic classifications are incongruous. Theor. Appl. Genet. 1992, 85, 331–340. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.