Next Article in Journal
Dittrichia viscosa Selection Strategy Based on Stress Produces Stable Clonal Lines for Phytoremediation Applications
Next Article in Special Issue
Effect of Reducing Nitrogen Fertilization and Adding Organic Fertilizer on Net Photosynthetic Rate, Root Nodules and Yield in Peanut
Previous Article in Journal
Seasonality, Composition, and Antioxidant Capacity of Limonene/δ-3-Carene/(E)-Caryophyllene Schinus terebinthifolia Essential Oil Chemotype from the Brazilian Amazon: A Chemometric Approach
Previous Article in Special Issue
Compositional Changes in Hydroponically Cultivated Salicornia europaea at Different Growth Stages
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A High-Continuity Genome Assembly of Chinese Flowering Cabbage (Brassica rapa var. parachinensis) Provides New Insights into Brassica Genome Structure Evolution

1
Guangzhou Academy of Agricultural Sciences, Guangzhou 510335, China
2
College of Horticulture, South China Agricultural University, Guangzhou 510642, China
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
Plants 2023, 12(13), 2498; https://doi.org/10.3390/plants12132498
Submission received: 21 March 2023 / Revised: 19 June 2023 / Accepted: 27 June 2023 / Published: 29 June 2023
(This article belongs to the Special Issue Molecular Biology of Plant Growth and Development)

Abstract

:
Chinese flowering cabbage (Brassica rapa var. parachinensis) is a popular and widely cultivated leaf vegetable crop in Asia. Here, we performed a high quality de novo assembly of the 384 Mb genome of 10 chromosomes of a typical cultivar of Chinese flowering cabbage with an integrated approach using PacBio, Illumina, and Hi-C technology. We modeled 47,598 protein-coding genes in this analysis and annotated 52% (205.9/384) of its genome as repetitive sequences including 17% in DNA transposons and 22% in long terminal retrotransposons (LTRs). Phylogenetic analysis reveals the genome of the Chinese flowering cabbage has a closer evolutionary relationship with the AA diploid progenitor of the allotetraploid species, Brassica juncea. Comparative genomic analysis of Brassica species with different subgenome types (A, B and C) reveals that the pericentromeric regions on chromosome 5 and 6 of the AA genome have been significantly expanded compared to the orthologous genomic regions in the BB and CC genomes, largely driven by LTR-retrotransposon amplification. Furthermore, we identified a large number of structural variations (SVs) within the B. rapa lines that could impact coding genes, suggesting the functional significance of SVs on Brassica genome evolution. Overall, our high-quality genome assembly of the Chinese flowering cabbage provides a valuable genetic resource for deciphering the genome evolution of Brassica species and it can potentially serve as the reference genome guiding the molecular breeding practice of B. rapa crops.

1. Introduction

Brassica, which belongs to the Brassicaceae family, is among the most economically important genus, since it contains a wide range of staple vegetables and oilseed crops. Over the course of its evolution, Brassica experienced an additional whole genome-wide triplication (WGT) event after it split with Arabidopsis from a common ancestor [1,2]. Thus, species in the Brassica genus not only display great morphological and phytochemical diversity but also karyotype diversity [2,3]. Among the most agriculturally important Brassica species, there are three diploid genome types including Brassica rapa (AA), Brassica nigra (BB) and Brassica oleracea (CC), and three allopolyploid species which were generated by the pair combinations of the former three diploid species, including Brassica napus (AACC), Brassica juncea (AABB) and Brassica carinata (BBCC). These six species and their evolutionary origination and relationship with each other are well defined in a ‘triangle of U’ model [3,4].
Due to the rapid recent advances in sequencing technology, especially next-generation sequencing (NGS), a large number of Brassica species have been sequenced, but most of the genome assemblies resulted in a low contiguity. These sequenced genomes, for example those sequenced with Illumina/Roche 454 technology, including B. rapa var. pekinensis Chiifu [5], B. oleracea 02-12 [6], B. oleracea TO1000DH [7], B. nigra YZ12151 [4], B. napus [8,9,10], and B. juncea [3,4] had a relatively low continuity which may impede the genomic analysis especially at the complex genomic parts such as pericentromeric and centromeric regions. Recently, the application of long-read sequencing technologies, including Oxford Nanopore Technology (ONT) and Pacific Biosciences (PACBIO), to genome assembling has greatly improved the continuity of the assembled contigs. There are more and more Brassica genomes that were reported to be sequenced with long read technology with a resulting contig N50 of up to megabase size, including B. oleracea cultivars HDEM, Brassica rapa Z1 (yellow sarson) [11], B. oleracea var. botrytis [12], B. napus [13], Brassica rapa [14,15,16], and pak choi [17,18]. Considerable progress has been made to improve the assembly of Brassica genomes through the use of single-molecule sequencing, optical mapping, and chromosome conformation capture technologies. And the use of long-read sequencing technologies can overcome the limitations of short-read sequencing by producing long reads of tens of kilobases (kb), which span the repetitive regions in Brassica. These studies demonstrated great success in the assembly of high continuity genome assemblies (i.e., N50 > 5 Mb) [11] with long read technology in Brassica genomes. Since the great morphological and phytochemical diversity in the Brassica species, genome information from a wide range of representative Brassica species will be helpful and needed to deeply decipher the genomic variants that may contribute to the great diversity of not only the phenotype but also the karyotype of various cultivars of the species.
The Chinese flowering cabbage (Brassica rapa var. parachinensis), locally known as Caixin, Tsai Tai, Choy Sum, bok choy, or Tsai Hsin [19,20], is an important leafy and bolting stem vegetable widely grown in Asia, particularly in China, Japan, and Korea [21]. This vegetable has high nutritional value and is rich in vitamins, minerals, secondary metabolites and dietary fiber, which can confer human health-promoting effects [20]. Unlike other B. rapa vegetables, Chinese flowering cabbage can bolt and flower easily without strict vernalization under low temperature. Therefore, it is very important to conduct this genome sequencing and assembly to further uncover the genomic information and molecular mechanisms involved in the formation of special morphological and phytochemical characteristics of this cultivar.
In this study, we report a high continuity (N50 = 7.2 Mb) and chromosome level genome assembly for the Chinese flowering cabbage (Brassica rapa). It was assembled with an integrated approach using Illumina sequencing, PacBio, and high-throughput chromosome conformation capture (Hi-C) technology. The assembly resolved a large part of the pericentromeric regions of this species. In addition, genome comparison and evolutionary analysis of this genome and other representative Brassica species were conducted. The results provide novel insights into the Brassica genome structure evolution.

2. Materials and Methods

2.1. Sample Collection

Young leaves were collected from a single plant of B. rapa var. parachinensis cv. Youlv 701 (Figure 1), which is a highly inbred line issued by the Guangzhou Institute of Agriculture Science, in Guangzhou, Guangdong, China. The collected young leaves were soon frozen in liquid nitrogen and stored at −80 °C for DNA and RNA extraction.

2.2. DNA Extraction and Sequencing

For Illumina sequencing, the phenol/chloroform extraction protocol was used to extract DNA from 2 g of young leaves. An Illumina sequencing library for an insertion length of 250 bp was prepared using the TruSeq Nano DNA LT Library Preparation Kit (Illumina Inc., San Diego, CA, USA). DNA purity and size range were evaluated with Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA). An Illumina sequencing library (PE) with an insertion length of 300–350 bp was constructed and sequenced using the Illumina HiSeq 2000 platform.
The DNA extracted from the young leaves was also used for the PacBio sequencing library construction. According to the manufacturer’s protocol (Pacific Biosciences, San Diego, CA, USA), 10 μg of Chinese flowering cabbage genomic DNA was used for a 30-kb template library preparation using the BluePippin Size Selection system (Sage Science, Waltham, MA, USA). The library was sequenced on the PacBio SEQUEL II platform.
The PacBio platform was used to generate long genomic reads for the construction of a reference genome for the Chinese flowering cabbage. After removing adaptor sequences, more than 113 Gb of subreads were obtained with a 219 times sequence coverage. The sequencing data were used for the following genome assembly operations.

2.3. Genome Size Estimation Based on NGS Sequencing Data

The HTQC package [22] was used to filter low-quality bases and reads. Briefly, three steps were performed to clean the NGS data. First, the adapter sequences were removed from the reads; second, the reads with more than 10% N bases were eliminated; and third, reads with more than 50% low-quality bases (<=5) were discarded. Lastly, we obtained 42.3 Gb (~86X) of cleaned data for the Kmer-based analysis. We also randomly picked 10,000 read pairs and blasted them against the NCBI non redundant nucleotide (nt) database to check for obvious sample contamination.

2.4. De Novo Assembly of the Chinese Flowering Cabbage Genome

The MECAT2 package [23] was used for the Chinese flowering cabbage genome assembly. Long reads had a length cutoff of 10 kb. We applied two rounds of polishing using NGS short reads with Pilon [24]. TRF (tandem repeats finder) [24,25] was used to identify the tandem repeats, and we removed the contig with higher than 60% of series repeats. The completeness of the assembled genome was evaluated using BUSCO v3.0 analysis [26].

2.5. Hi-C Library Preparation and Data Analysis

In the present study, 8 g of young leaf tissue collected from the same B. rapa var. parachinensis plant was used for Hi-C library construction. The Hi-C experiment consisted of the following steps: crosslinking, lysis, chromatin digestion, biotin marking, proximity ligations, cross linking reversal, and DNA purification [27]. The purified and enriched DNA was used for the sequencing library construction; the DNA was sequenced using the Illumina HiSeq 2000 platform (Illumina, San Diego, CA, USA). Assembled contigs were scaffolded using Juicer [28] and 3D-DNA [29]. MCScanX [30] was used to make a collinear comparison between scaffolds and the existing B. rapa genome [31]. The sequence was given a new name after exhibiting synteny to B. rapa z1.
We used bwa mem [32] to map two paired reads to the chromosome level genome sequence alone with these parameters “-A1-B4-E50-L0”. Then the HiCExplorer kit [33] was used to build a Hi-C contact map. Parameters for the step hicCorrectMatrix were set to “--filterThreshold-3.5 5” and the rests were kept at the default settings.

2.6. Single Molecule RNA Sequencing (Iso-Seq) Experiment and Data Analysis

For gene annotation of the genome, transcriptome sequencing was performed with mixed leaves and roots of a young seedling (14 days after imbibition). RNA was extracted with the TRIzol Reagent (Invitrogen, Waltham, MA, USA). The RNA quality was checked by a spectrophotometer (LabTech, Hopkinton, MA, USA) and a 2100 Bioanalyzer (Agilent Technologies, USA). The verified RNA was used for transcriptome sequencing library construction. Briefly, the mRNA was reversely transcribed using a Clontech SMARTer cDNA synthesis kit. A BluePippin Size Selection System (Pacific Biosciences of California, Menlo Park, CA, USA) was used to perform the size selection for the two libraries, sized 0–3 kb and 2–6 kb, respectively, after cDNA amplification and purification. The SMRTbell libraries were constructed according to the manufacturer’s protocol and sequenced on the PacBio SEQUEL II platform (Pacific Biosciences of California, Menlo Park, CA, USA). Last, we used SMRTLink 7.0 (https://www.pacb.com/support/software-downloads/) (accessed on 15 January 2022) to produce all the mRNA sequences for genome annotation.

2.7. Repetitive Element Annotation and Construction of Circos Picture

The extended de-novo TE Annotator (EDTA) [34] was used to annotate the DNATE and LTR type sequences of the genome. TRF (tandem repeats finder) [25] was used to identify the centromere sequence with 20,000 points as the threshold. Finally, the repeat sequences were annotated with MAKER [35]. MCScanX [30] was used to find the collinearity from the comparison results and generate link files. Four tracks were constructed from the outer to the inner of Circos [36], showing gene density, LTR density, DNATE density and TE density, respectively, and the collinearity within the genome was shown in the inner circle.

2.8. Protein Coding Gene Prediction

The Isoseq3 pipeline (https://github.com/pacificbiosciences/isoseq) (accessed on 7 February 2022) was used to process the full-length transcriptome data of the Chinese flowering cabbage to obtain the transcriptome sequence. At the same time, in order to obtain a more complete gene annotation, we integrated the annotation content of B. juncea [4], B. napus [8], B. oleracea [6], B. rapa [16] and B. nigra [3] as the reference gene sequence using CD-HIT-EST (https://github.com/weizhongli/cdhit) (accessed on 10 February 2022) to remove the sequence redundancy. The results of the repeats sequence found by EDTA [34] and TRF [25] were used as reference repeats to enter into MAKER [35] for five rounds of gene and repeat sequence annotation.

2.9. Phylogenetic Analysis

The phylogenetic relationships between the Chinese flowering cabbage and other Brassica plants were analyzed using the orthologs from single-copy genes. We used Diamond for Orthofinder to build orthogroups. 20 eudicot species’ proteomes were retrieved from the Brassica Database(brassicadb.cn). We downloaded the reference genome and gff record first. Then gffread was used for the command “gffread-g$refgenome-y$protoems $gff_record” to get all species’ proteomes. The 20 eudicot species and references are listed in the Table S1. The Orthofinder package was used to find orthogroups and single-copy genes. All the single-copy genes in one species were concatenated into a super alignment, then run through a multiple sequence alignment using the mafft program. Easyspecietree (https://github.com/Davey1220/EasySpeciesTree) (accessed on 14 February 2022) was used to generate the phylogenetic relationship between the species using the maximum likelihood method.

2.10. Structural Variants Analysis

Structural variations were detected using an assembly-based pipeline based on LASTZ/CHAIN/NET/NETSYNTENY tools [37,38,39,40] which is publicly available at https://github.com/yiliao1022/LASTZ_SV_pipeline. (accessed on 18 February 2022) Insertion times of the LTR-retrotransposons were estimated by the divergence time (T) between the two LTRs of each intact element with the formula: T = K/2r, where K refers to the sequence difference between the 5′-LTR and 3′-LTR of an intact LTR element and r refers to the average mutation rate. Here we used the neutral substitution rate of 1.5 × 10−1 per synonymous site per generation [41].

3. Results

3.1. A Highly Continuous Genome Assembly of Chinese Flowering Cabbage (B. rapa var. parachinensis)

A highly inbred line of Chinese flowering cabbage (B. rapa var. parachinensis, Figure 1) was used for the genome sequencing and assembly with deep coverage long reads and Hi-C data. The assembly pipeline for the Brassica rapa var. parachinensis genome was shown in Figure 1. DNA samples from a single plant were prepared for PacBio, Illumina, and Hi-C sequencing to avoid potential genome variability between different plants. Overall, we obtained a total of 113 Gb PacBio and 47.5 Gb Illumina raw reads (Table S2), corresponding to a 219 and 86 depth of the estimated genome size (515 Mb), respectively. A preliminary survey of the genome size, heterozygosity, GC, and transposon elements (TEs) content of this inbred line was carried out with 32 GB clean Illumina reads (Table 1; ~83 coverage) using the Kmer-based method. The genome size was estimated to be about 515 Mb with an overall GC content of 38.9% and transposon elements (TE) content of 64.1% (Table S2). Remarkably, the heterozygosity was very low, with only 0.16% that would facilitate assembly.
We applied a hybrid strategy to assemble the genome. Firstly, the MECAT2 package [23] was used for the Chinese flowering cabbage genome assembly. Secondly, long reads with a length cutoff of 10 kb were polished using NGS short reads with Pilon [24]. Finally, we obtained the final contig assembly of 384 Mb with a contig N50 length of 7.2 Mb. The genome contained 450 contigs, and the longest contig was 19.9 Mb (Table 1). The GC content for the genomic contigs were 37.6% (Table 1). The results of the coverage statistics by SAM tools suggested that the assembly of this genome is credible (Table S3). Furthermore, we found that 97.8% and 0.8% of the completed and partial genes of the total of 1440 BUSCO genes were detected in the genome, respectively, which validated the completeness of the genome (Table S4).
Furthermore, the high-throughput chromatin conformation capture (Hi-C) data was used to scaffold the contigs into a chromosome-level assembly. We obtained a total of 66 Gb cleaned Hi-C paired-end (PE) reads which is about 128 times the genome depth. Of which, 98.27% (434 M/442 M) were mappable to the current assembly and ~33.18% (147 M/442 M) were mapped to different contigs. Using the contact frequency calculated from the PE reads, 180 contigs were further folded into 10 pseudo-chromosomes (Figure 2A). These 180 contigs represent 87.93% (338 Mb/384 Mb) of the total assembled sequence and 40% (180/450) of the total contigs. The final assembly contains 69 scaffolds with a scaffold N50 of 32 Mb and the longest scaffold of 47.5 Mb in length (Table 1). The Circos map of the genome shows that each position is collinear with the other two, indicating that the annotation is complete (Figure 2B). A large number of corrected repeat regions on A05 and A06 chromosomes were identified (Figure 2B), which indicated that there might be a large region of DNA transposons and LTR transposons found at this region.
We also performed de novo gene prediction with guidance by homologs from the related species, using the transcriptome from short read data and full-length transcripts from ISO-seq sequencing from the present study using the MAKER pipeline [35]. We annotated 47,598 protein-coding genes in the Chinese flowering cabbage genome with an average gene length of 2060 bp (Table 1). The average number of exons per gene is 6.13, with a mean length of 199 bp (Table 1). Approximately 53.2% of the genome is annotated as repetitive sequences, which is consistent with the estimation from the Kmer-based method. LTR retrotransposons (22.26%) and DNA transposons (17.62%) are the most abundant families (Table S5).
In conclusion, we provide, to our knowledge so far, the most contiguous genome assembly of this species.

3.2. Gene Duplication Analysis across 20 Eudicot Genomes Reveals the Current B. rapa var. parachinensis Genome Is among the Most High-Quality Assemblies of Brassica Genomes

To assess the completeness of genome assembly and gene models, we used Orthofinder [42] to construct the ortholog group across 20 eudicot species and separate them into three categories: ortholog groups either with a single copy gene, two genes, or multiple (more than two) genes. The frequency of each group among the 20 eudicot species revealed that the Brassica species (i.e., B. napus, B. rapa, B. juncea and B. nigra) harbor more duplicated orthologs than Arabidopsis species (Figure 3A,B), which is consistent with the fact that Brassica species experienced an extra whole genome triplication (WGT) event compared with the model plant Arabidopsis thaliana [6]. Additionally, more duplicated orthologs are identified in the current B. rapa var. parachinensis genome assembly than in the two other assemblies of this species with a relative lower N50 (Figure 3A), suggesting that we obtained a higher N50 length of the genome assembly and a more alternative splicing annotation than previous studies [11,16]. A BUSCO analysis suggested that all the 12 Brassica species have a high quality of genome assembly and the current B. rapa var. parachinensis has the highest BUSCO value (Figure 3B).
Next, we compared the overlap of gene models among B. rapa var. parachinensis and two other B. rapa genomes [11,16]. A total of 19,042 genes are shared by all three genomes. The Chinese flowering cabbage genome (Figure 3C) has more specific gene models, which may be caused by the difference of assembly quality among these three genomes or their specific gene amplification history.

3.3. Phylogenetic Analysis of a Collection of Brassica Genomes Reveals That the Chinese Flowering Cabbage Has a Closer Evolutionary Relationship with the Diploid Progenitor of the Allotetraploid Species, B. Juncea

The Brassicaceae family serves as a useful model for studying polyploidy and chromosome evolution. The evolutionary relationship of six ecologically important Brassica species, including three diploid species (B. rapa, B. oleracea, and B. nigra) and three allotetraploid species (B. napus, B. juncea, and B. carinata), was well described in a classical U triangle model [2]. To elucidate the evolutionary distance of the current Chinese flowering cabbage genome to other Brassica genomes, we constructed a phylogenetic tree (Figure 4) for 12 collected Brassica genomes and eight related Brassicaceae species using the coding sequences of 434 single-copy genes that are present in all the species. The result shows that the three Brassica genome types are clearly separated from each other among the investigated species. The current Chinese flowering cabbage has an AA genome type which is closer to the AA genome of the allotetraploid species, B. juncea, than the AA genome of another B. rapa line, B. rapa var pekinensis in the phylogenetic tree, suggesting Chinese flowering cabbage is evolutionarily closer to the diploid progenitor of the allotetraploid species, Brassica juncea. Also, in the CC genome clade, B. oleracea var capitata was primarily the sister to two B. napus CC genomes and then with B. oleracea var italica, implying B. oleracea var capitata has a CC genome that is closer to the donor of CC genome of the B. napus. Similarly, B. rapa Z1 was sister first to the B. napus AA genome and then other AA genomes, pointing to it as being evolutionarily closer to the AA genome progenitor of B. napus.

3.4. Extensive Chromosomal Arrangements between Brassica Species

Genome-wide synteny analysis was conducted using syntenic orthologous genes both within and between species for Brassica rapa. Firstly, the genome of Chinese flowering cabbage was compared to two published genome assemblies of different strains of this species, B. rapa Z1 [11] and B. rapa var. pekinensis [16]. The SyMAP map reveals that these three Brassica rapa assemblies retain well conserved overall genome architecture except for a translocation event between chromosome 1 and chromosome 3 that differentiates our assembly to the other two assemblies (Figure 5A). Next, we performed the comparison between B. rapa var. parachinensis and two highly continuous assemblies of the B. oleracea genome [6,11]. Besides the different chromosome numbers (i.e., B. rapa var. parachinensis; AA genome, n = 10 and B. oleracea; CC genome, n = 9), we observed extensive chromosomal rearrangements between these two species (Figure 5B). Only 2 chromosomes (Chr1 and Chr2) showed minimal changes since their divergence from a common ancestor. The extensive chromosomal rearrangements that occurred during the course of Brassica genome evolution is different from the observation in Oryza, one of the well-studied genus models in a monocot, in which the karyotype of most diploid species is well-conserved, even over 15 million years evolutionary history [43].

3.5. Genome Structure Evolution in Brassica: Insight from Pericentromeric Regions

The pericentromeric regions of plant genomes are among the most rapidly evolving genomic parts, which are found to be largely driven by some major mechanisms such as the proliferation of LTR-retrotransposons, gene conversions, and segmental duplications [44]. Comparison of the pericentromeric regions among three assemblies of the B. rapa with different assembly qualities (Supplementary Figure S1E) revealed that the current assembly resolved a larger part of the pericentromeric repetitive regions than the other two assemblies (Supplementary Figure S1A–D). A large part of the pericentromeric regions was missed in the other two assemblies, especially the B. rapa var. pekinensis assembly. This result shows that high contiguous genome assemblies are required for comparative genomic analysis of highly repetitive regions.
Thus, for interspecies comparison, we selected highly contiguous assemblies for two closely related Brassica species, B. nigra and B. oleracea, which represent two other Brassica genome types (BB and CC) and compared the genome structure and sequence features at the pericentromeric regions of all chromosomes among these three Brassica species or genome types. We found that the pericentromeric regions of chromosome 5 and 6 in B. rapa experienced a lineage-specific LTR-retrotransposon amplification history. For example, comparison of chromosome 5 between B. rapa and B. nigra (Figure 6A) showed that B. rapa has a clear enrichment of the LTR retrotransposon compared to the orthologous pericentromeric regions of B. nigra although the syntenic relationship of the whole chromosome is well retained between these two species. This difference is more likely to be caused by the lineage specific LTR retrotransposon amplification history since their divergence. While comparison between B. rapa and B. oleracea (Figure 6B) showed that the synteny of chromosome 5 breaks at the centromere region (see also Figure 5B) and the break event is more likely to occur in the B. oleracea lineage since the B. rapa shares the synteny block with B. nigra (Figure 6A), while the B. oleracea does not (Figure 6C). Thus, chromosome rearrangements may be an alternative cause for the different genome structure features observed in the pericentromeric regions. Similarly, the comparison of chromosome 6 revealed an analogous pattern (Figure 6D–F).

3.6. Structural Variants in Brassica Genomes

Structural variation (SV) is generally defined as genomic alterations that are 50 bp or larger in size, typically including insertions (INSs), deletions (DELs), duplications (DUPs), inversions (INVs) and translocations (TRAs). SVs greatly impact the genes encoded in the genome and are responsible for diverse agronomically important phenotypes/traits. Compared to single nucleotide polymorphism (SNP) and short insertions and deletions (InDels), SVs are less commonly explored due to the difficulty in fully identifying them with short reads. De novo genome assemblies, especially with high contiguity, can facilitate in-depth genome-wide identification of all forms of structural variations. To the best of our knowledge, no work so far has been conducted to identify SVs based on high-contiguous genome assemblies in Brassica genomes. To close this knowledge gap and have a first glimpse of SVs differing within Brassica rapa genomes, we identified SVs using the genomes of B. rapa Z1 [11] and B. rapa var. parachinensis (this study), each with genome assembly contig N50, 5.51 Mb and 7.26 Mb, respectively. As shown in Figure 5A, these two genomes are different only in a single translocation and do not exist in large chromosomal rearrangements. Using the whole genome alignment approach, we identified a total of 27,190 insertions, 26,002 deletions, 1374 duplications in parachinensis assembly, 1368 duplications in Z1 assembly, and 46 medium-sized inversions with sizes ranging from 5.2 Kb to 1431.6 Kb, and 8565 complex SVs with imprecise breakpoints between Z1 and parachinensis (Figure 7A). Of the insertion events, 845 and 847 are found to be newly occurred LTR insertions specifically in parachinensis and Z1 assembly, respectively, which are consistent with their relatively recent estimated insertion times (Figure 7B). A large proportion of insertions and deletions detected was found to overlap with the gene regions based on the gene annotation. In Figure 7C, two cases of local tandem duplication are shown to overlap with gene fragments or full genes. Additionally, comparative genomic analysis can also provide insights into the mutational mechanisms of structural variations. Of the 46 inversions identified, we found that repeat sequences, especially inverted repeat sequence features prevail at the flanking regions, highlighting the causal role of sequence features on small-size inversion formation (Figure 7D). Taken together, our analysis of genomic structural variations based on these highly contiguous genome assemblies provide the first glimpse of SVs in the Brassica genomes and their functional significance on gene structure and thus the potential effect on phenotype.

4. Discussion

Chinese flowering cabbage (B. rapa var. parachinensis) is an important leafy and bolting stem vegetable with high nutritional value which has been widely grown in Asia [19]. Among the abundant ecological types of Brassica rapa that are planted as vegetables in China, the Chinese flowering cabbage is the one that is well-adapted to the high temperature and high humidity climate in the south of China. It can be planted all year round for tender flower products without the need for a strict vernalization process. In this study, we report a chromosome-level genome assembly of this important ecological B. rapa strain, the Chinese flowering cabbage, which provides a valuable genomic data resource for evolutionary studies for B. rapa and related Brassica species.
Highly continuous genome assembly is critical for genome-wide marker development and gene model prediction. Enormous studies have demonstrated that recent long-read sequencing technologies can greatly improve the continuity of genome assembly [3,11,13]. In this study, we used PacBio long reads to assemble the B. rapa var. parachinensis genome. Because of the low heterozygous ratio (0.16%) of the plants used in this genome sequencing, we obtained the contig N50 length of 7.26 Mb, which is longer than the two B. rapa genomes sequenced recently by PacBio and Nanopore technology [11,16,17], and much longer than the genomes of B. rapa and B. oleracea sequenced using Illumina technology [3,6]. We applied the Hi-C technique to scaffold more than 545 Mb contigs onto 10 chromosomes. The N50 length scaffold of the final assembly reached 32.3 Mb, with the maximum size of 47.4 Mb, which was similar to the B. rapa Z1 genome sequenced with Nanopore technology [11] (Table S6). The completeness of the genome (97.8%) was validated using the BUSCO analysis in the present study, and surpassed most of the genome of related Brassica species sequenced thus far, including B. oleracea HDEM [11], B. oleracea var. botrytis [12], and B. rapa Z1 [11] (Table S6).
In the present study, the assembly of the Chinese flowering cabbage genome resolved most of the pericentromeric regions of the B. rapa. Among them, the pericentromeric regions of chromosome 5 (A05) and 6 (A06) were found to be significantly expanded in comparison to other pericentromeric regions and very few genes were annotated in this region (Figure 2B; Figure 6). This observation can further be verified by the Hi-C contact map in which the pericentromeric regions of chromosomes 5 and 6 have a clear sparse Hi-C contact signal that is mostly caused by repetitive sequences (Figure 3). Strikingly, this expansion seems to be lineage specific since we do not observe a similar pattern in the two other Brassica genome types, i.e., chromosome C05 and C06 in B. oleracea and B. napus [11,13], and chromosome B05 and B06 in B. nigra (Figure 6A). This lineage specific expansion may have played a role in the evolutionary divergence of Brassica AA, BB, and CC genomes. It is worth noting that such large repetitive regions can only be resolved by long-read sequencing technology. For example, in the previous studies, B. rapa Z1 and the B. napus AA genome assemblies present a similar but relatively weaker pattern than the current assembly [11,13,16] (Figure S1). However, in the assembly of B. rapa [11,13,16] (Figure S1E), sequenced by PacBio Sequel with a N50 of 1.45 Mb, does not present the large repetitive regions in its assembly (Supplementary Figure S1E).
The genus Brassica contains three basic genomes, B. rapa (AA genome), B. nigra (BB genome), and B. oleracea (CC genome), which further hybridize to give rise to three allopolyploid species, B. napus (AACC genome), B. juncea (AABB genome), and B. carinata (BBCC genome) [2,12]. In the present study, a phylogenetic tree was constructed to analyze the evolution of the Brassica species. Interestingly, the Chinese flowering cabbage shows the closest relationship with the B. juncea AA genome but not with two B. rapa genomes (Chinese cabbage and yellow sarson) (Figure 4) [11,16]. The B. rapa species can be further subdivided into six populations: turnips (Chinese and European turnips), sarsons (sarson, rapid cycling and spring/winter oilseed), turnip rapes, taicai and mixed Japanese morphotypes, pak choi (pak choi, wutacai, Chinese flowering cabbage, and zicaitai varieties), and heading Chinese cabbages [2]. Our results suggested that the donor of the AA genome in B. juncea is most likely from the pak choi group (Chinese flowering cabbage) in contrast to other B. rapa varieties, such as sarsons and turnips [11,31]. Meanwhile, we found that B. rapa Z1 (sarson) was sister firstly to B. napus AA genome and then other AA genomes, implying that it should be the evolutionarily closest donor of the AA genome in B. napus. Similarly, the B. oleracea can also be subdivided into seven populations such as kohlrabies, Chinese kale, cauliflower, broccoli, Brussels sprouts, kale, and cabbages [2]. Interestingly, B. oleracea var. capitata (cabbages) was sister firstly to two B. napus CC genomes and then with B. oleracea var. italica (broccoli), implying the donor of the CC genome in B. napus probably evolved from B. oleracea var. capitata (cabbages) (Figure 4). Thus, we demonstrated that high continuity genome assemblies can aid in the interpretation of the evolutionary relationships among Brassica species.
Numerous cases of studies found that structural variations can impact larger genomic regions than SNPs. Structural variant (SV) discovery would not only help our understanding of the landscape of genomic variation within and between species, but also reveal the functional significance of SVs [45]. In comparison to the SVs detection methods that are based on Illumina short reads, the whole assembly-based method can fully recover the SVs in theory but still depend on assembly quality. SVs studies in humans [46,47], and in a wide range of plant species, such as rice [45], Maize [48], tomato [48,49], Arabidopsis [49], and Brassica rapa [50], indicate that SVs can affect a large proportion of coding genes. In the current study, we detected SVs between the genome assemblies of two Brassica rapa lines and identified a total of 27,190 insertions, 26,002 deletions, 1368 duplications, and 46 medium-sized inversions with a size from 5.2 Kb to 1431.6 Kb, and 8565 complex SVs with imprecise breakpoints between them (Figure 7). These SVs may affect coding genes that may further contribute to phenotypic variations, such as morphological and phytochemical characteristics.
In summary, we report a chromosome-level genome assembly of the Chinese flowering cabbage and its accurate gene and TE annotation. The phylogenetic analysis indicates this genome has a closer evolutionary relationship with the AA diploid progenitor of B. juncea. We also found the lineage specific pericentromeric expansion events on the chromosomes 5 and 6 of the Brassica AA genome compared to the orthologous genomic regions in the Brassica BB and CC genomes. Finally, we report a large number of structural variations (SVs) between two B. rapa lines (Z1 and parachinensis) using high continuity genome assemblies. Overall, our high-quality genome assembly of the Chinese flowering cabbage provides a valuable genetic resource for deciphering the genome evolution of Brassica species and it would serve as the reference genome guiding the molecular breeding practice of B. rapa crops.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants12132498/s1, Figure S1: Comparative synteny analysis of the pericentromeric regions of Chr05 and Chr06 among three B. rapa lines; Table S1. The 20 eudicot species and references; Table S2: Statistics of sample genome characteristics (K-mer = 17); Table S3: Coverage statistics of Chinese flowering cabbage genome using SAM tools; Table S4: Assessment of the completeness of the Chinese flowering cabbage genome assembly by BUSCO (Benchmarking Universal Single-Copy Orthologs); Table S5: Distribution of repeat sequences in Chinese flowering cabbage genome; Table S6: Comparison of the genome assembly between Chinese flowering cabbage (B. rapa var. parachinensis) and other representative Brassica plants.

Author Contributions

Y.Z., C.C. and Y.L. designed the project and wrote the draft manuscript. G.L., Y.L., J.W., T.Z. and D.J. contributed to the genome assembly, genome evolution analysis, and structural variants analysis. H.Z., X.D., H.R. and T.Z. participated in data analysis and substantively revised the manuscript. The final manuscript has been read and approved by all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Key-Area Research and Development Program of Guangdong Province (2022B0202080001), the Science and Technology Program of Guangzhou (202201010854, 202206010173, 2023B03J1270).

Data Availability Statement

The raw genome, RNA sequencing data and Hi-C data were deposited in the China National GeneBank Data Base (CNGBdb) under Bioproject number CNP0001121. The final chromosome assembly was submitted to CNGBdb under the same Bioproject.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lysak, M.A.; Koch, M.A.; Pecinka, A.; Schubert, I. Chromosome Triplication Found across the Tribe Brassiceae. Genome Res. 2005, 15, 516–525. [Google Scholar] [CrossRef] [Green Version]
  2. Cheng, F.; Sun, R.; Hou, X.; Zheng, H.; Zhang, F.; Zhang, Y.; Liu, B.; Liang, J.; Zhuang, M.; Liu, Y.; et al. Subgenome Parallel Selection Is Associated with Morphotype Diversification and Convergent Crop Domestication in Brassica Rapa and Brassica Oleracea. Nat. Genet. 2016, 48, 1218–1224. [Google Scholar] [CrossRef]
  3. Wang, W.; Guan, R.; Liu, X.; Zhang, H.; Song, B.; Xu, Q.; Fan, G.; Chen, W.; Wu, X.; Liu, X.; et al. Chromosome Level Comparative Analysis of Brassica Genomes. Plant Mol. Biol. 2019, 99, 237–249. [Google Scholar] [CrossRef]
  4. Yang, J.; Liu, D.; Wang, X.; Ji, C.; Cheng, F.; Liu, B.; Hu, Z.; Chen, S.; Pental, D.; Ju, Y.; et al. Author Correction: The Genome Sequence of Allopolyploid Brassica Juncea and Analysis of Differential Homoeolog Gene Expression Influencing Selection. Nat. Genet. 2018, 50, 1616. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Wang, X.; Wang, H.; Wang, J.; Sun, R.; Wu, J.; Liu, S.; Bai, Y.; Mun, J.-H.; Bancroft, I.; Cheng, F.; et al. The Genome of the Mesopolyploid Crop Species Brassica Rapa. Nat. Genet. 2011, 43, 1035–1039. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Liu, S.; Liu, Y.; Yang, X.; Tong, C.; Edwards, D.; Parkin, I.A.P.; Zhao, M.; Ma, J.; Yu, J.; Huang, S.; et al. The Brassica Oleracea Genome Reveals the Asymmetrical Evolution of Polyploid Genomes. Nat. Commun. 2014, 5, 3930. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  7. Parkin, I.A.P.; Koh, C.; Tang, H.; Robinson, S.J.; Kagale, S.; Clarke, W.E.; Town, C.D.; Nixon, J.; Krishnakumar, V.; Bidwell, S.L.; et al. Transcriptome and Methylome Profiling Reveals Relics of Genome Dominance in the Mesopolyploid Brassica Oleracea. Genome Biol. 2014, 15, R77. [Google Scholar] [CrossRef] [Green Version]
  8. Chalhoub, B.; Denoeud, F.; Liu, S.; Parkin, I.A.P.; Tang, H.; Wang, X.; Chiquet, J.; Belcram, H.; Tong, C.; Samans, B.; et al. Plant Genetics. Early Allopolyploid Evolution in the Post-Neolithic Brassica Napus Oilseed Genome. Science 2014, 345, 950–953. [Google Scholar] [CrossRef] [Green Version]
  9. Bayer, P.E.; Hurgobin, B.; Golicz, A.A.; Chan, C.-K.K.; Yuan, Y.; Lee, H.; Renton, M.; Meng, J.; Li, R.; Long, Y.; et al. Assembly and Comparison of Two Closely Related Brassica Napus Genomes. Plant Biotechnol. J. 2017, 15, 1602–1610. [Google Scholar] [CrossRef] [Green Version]
  10. Sun, F.; Fan, G.; Hu, Q.; Zhou, Y.; Guan, M.; Tong, C.; Li, J.; Du, D.; Qi, C.; Jiang, L.; et al. The High-Quality Genome of Brassica Napus Cultivar “ZS11” Reveals the Introgression History in Semi-Winter Morphotype. Plant J. 2017, 92, 452–468. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Belser, C.; Istace, B.; Denis, E.; Dubarry, M.; Baurens, F.-C.; Falentin, C.; Genete, M.; Berrabah, W.; Chèvre, A.-M.; Delourme, R.; et al. Chromosome-Scale Assemblies of Plant Genomes Using Nanopore Long Reads and Optical Maps. Nat Plants 2018, 4, 879–887. [Google Scholar] [CrossRef]
  12. Sun, D.; Wang, C.; Zhang, X.; Zhang, W.; Jiang, H.; Yao, X.; Liu, L.; Wen, Z.; Niu, G.; Shan, X. Draft Genome Sequence of Cauliflower ( L. Var. ) Provides New Insights into the C Genome in Species. Hortic Res 2019, 6, 82. [Google Scholar] [CrossRef] [Green Version]
  13. Song, J.-M.; Guan, Z.; Hu, J.; Guo, C.; Yang, Z.; Wang, S.; Liu, D.; Wang, B.; Lu, S.; Zhou, R.; et al. Eight High-Quality Genomes Reveal Pan-Genome Architecture and Ecotype Differentiation of Brassica Napus. Nat Plants 2020, 6, 34–45. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Lou, P.; Woody, S.; Greenham, K.; VanBuren, R.; Colle, M.; Edger, P.P.; Sartor, R.; Zheng, Y.; Levendoski, N.; Lim, J.; et al. Genetic and Genomic Resources to Study Natural Variation in. Plant Direct 2020, 4, e00285. [Google Scholar] [CrossRef] [PubMed]
  15. Wei, X.; Rahim, M.A.; Zhao, Y.; Yang, S.; Wang, Z.; Su, H.; Li, L.; Niu, L.; Harun-Ur-Rashid, M.; Yuan, Y.; et al. Comparative Transcriptome Analysis of Early- and Late-Bolting Traits in Chinese Cabbage (). Front. Genet. 2021, 12, 590830. [Google Scholar] [CrossRef]
  16. Zhang, L.; Cai, X.; Wu, J.; Liu, M.; Grob, S.; Cheng, F.; Liang, J.; Cai, C.; Liu, Z.; Liu, B.; et al. Erratum: Author Correction: Improved Reference Genome by Single-Molecule Sequencing and Chromosome Conformation Capture Technologies. Hortic Res 2019, 6, 124. [Google Scholar] [CrossRef] [Green Version]
  17. Li, P.; Su, T.; Zhao, X.; Wang, W.; Zhang, D.; Yu, Y.; Bayer, P.E.; Edwards, D.; Yu, S.; Zhang, F. Assembly of the Non-Heading Pak Choi Genome and Comparison with the Genomes of Heading Chinese Cabbage and the Oilseed Yellow Sarson. Plant Biotechnol. J. 2021, 19, 966–976. [Google Scholar] [CrossRef] [PubMed]
  18. Xu, H.; Wang, C.; Shao, G.; Wu, S.; Liu, P.; Cao, P.; Jiang, P.; Wang, S.; Zhu, H.; Lin, X.; et al. The Reference Genome and Full-Length Transcriptome of Pakchoi Provide Insights into Cuticle Formation and Heat Adaption. Hortic Res 2022, 9, uhac123. [Google Scholar] [CrossRef]
  19. Tan, X.-L.; Fan, Z.-Q.; Kuang, J.-F.; Lu, W.-J.; Reiter, R.J.; Lakshmanan, P.; Su, X.-G.; Zhou, J.; Chen, J.-Y.; Shan, W. Melatonin Delays Leaf Senescence of Chinese Flowering Cabbage by Suppressing ABFs-Mediated Abscisic Acid Biosynthesis and Chlorophyll Degradation. J. Pineal Res. 2019, 67, e12570. [Google Scholar] [CrossRef]
  20. Xiao, X.-M.; Xu, Y.-M.; Zeng, Z.-X.; Tan, X.-L.; Liu, Z.-L.; Chen, J.-W.; Su, X.-G.; Chen, J.-Y. Activation of the Transcription of by a BrTCP21 Transcription Factor Is Associated with Gibberellin-Delayed Leaf Senescence in Chinese Flowering Cabbage during Storage. Int. J. Mol. Sci. 2019, 20, 3860. [Google Scholar] [CrossRef] [Green Version]
  21. Kamran, M.; Xie, K.; Sun, J.; Wang, D.; Shi, C.; Lu, Y.; Gu, W.; Xu, P. Modulation of Growth Performance and Coordinated Induction of Ascorbate-Glutathione and Methylglyoxal Detoxification Systems by Salicylic Acid Mitigates Salt Toxicity in Choysum (Brassica Parachinensis L.). Ecotoxicol. Environ. Saf. 2020, 188, 109877. [Google Scholar] [CrossRef]
  22. Yang, X.; Liu, D.; Liu, F.; Wu, J.; Zou, J.; Xiao, X.; Zhao, F.; Zhu, B. HTQC: A Fast Quality Control Toolkit for Illumina Sequencing Data. BMC Bioinformatics 2013, 14, 33. [Google Scholar] [CrossRef] [Green Version]
  23. Xiao, C.-L.; Chen, Y.; Xie, S.-Q.; Chen, K.-N.; Wang, Y.; Han, Y.; Luo, F.; Xie, Z. MECAT: Fast Mapping, Error Correction, and de Novo Assembly for Single-Molecule Sequencing Reads. Nat. Methods 2017, 14, 1072–1074. [Google Scholar] [CrossRef] [PubMed]
  24. Walker, B.J.; Abeel, T.; Shea, T.; Priest, M.; Abouelliel, A.; Sakthikumar, S.; Cuomo, C.A.; Zeng, Q.; Wortman, J.; Young, S.K.; et al. Pilon: An Integrated Tool for Comprehensive Microbial Variant Detection and Genome Assembly Improvement. PLoS One 2014, 9, e112963. [Google Scholar] [CrossRef] [PubMed]
  25. Benson, G. Tandem Repeats Finder: A Program to Analyze DNA Sequences. Nucleic Acids Res. 1999, 27, 573–580. [Google Scholar] [CrossRef] [Green Version]
  26. Seppey, M.; Manni, M.; Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness. Methods Mol. Biol. 2019, 1962, 227–245. [Google Scholar] [PubMed]
  27. Yang, X.; Liu, H.; Ma, Z.; Zou, Y.; Zou, M.; Mao, Y.; Li, X.; Wang, H.; Chen, T.; Wang, W.; et al. Chromosome-Level Genome Assembly of Triplophysa Tibetana, a Fish Adapted to the Harsh High-Altitude Environment of the Tibetan Plateau. Mol. Ecol. Resour. 2019, 19, 1027–1036. [Google Scholar] [CrossRef]
  28. Durand, N.C.; Shamim, M.S.; Machol, I.; Rao, S.S.P.; Huntley, M.H.; Lander, E.S.; Aiden, E.L. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 2016, 3, 95–98. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Dudchenko, O.; Batra, S.S.; Omer, A.D.; Nyquist, S.K.; Hoeger, M.; Durand, N.C.; Shamim, M.S.; Machol, I.; Lander, E.S.; Aiden, A.P.; et al. De Novo Assembly of the Genome Using Hi-C Yields Chromosome-Length Scaffolds. Science 2017, 356, 92–95. [Google Scholar] [CrossRef] [Green Version]
  30. Wang, Y.; Tang, H.; Debarry, J.D.; Tan, X.; Li, J.; Wang, X.; Lee, T.-H.; Jin, H.; Marler, B.; Guo, H.; et al. MCScanX: A Toolkit for Detection and Evolutionary Analysis of Gene Synteny and Collinearity. Nucleic Acids Res. 2012, 40, e49. [Google Scholar] [CrossRef] [Green Version]
  31. Cai, C.; Wang, X.; Liu, B.; Wu, J.; Liang, J.; Cui, Y.; Cheng, F.; Wang, X. Brassica Rapa Genome 2.0: A Reference Upgrade through Sequence Re-Assembly and Gene Re-Annotation. Mol. Plant 2017, 10, 649–651. [Google Scholar] [CrossRef] [Green Version]
  32. Vasimuddin, M.; Misra, S.; Li, H.; Aluru, S. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. In Proceedings of the 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, 20–24 May 2019. [Google Scholar]
  33. Wolff, J.; Bhardwaj, V.; Nothjunge, S.; Richard, G.; Renschler, G.; Gilsbach, R.; Manke, T.; Backofen, R.; Ramírez, F.; Grüning, B.A. Galaxy HiCExplorer: A Web Server for Reproducible Hi-C Data Analysis, Quality Control and Visualization. Nucleic Acids Res. 2018, 46, W11–W16. [Google Scholar] [CrossRef] [PubMed]
  34. Ou, S.; Su, W.; Liao, Y.; Chougule, K.; Agda, J.R.A.; Hellinga, A.J.; Lugo, C.S.B.; Elliott, T.A.; Ware, D.; Peterson, T.; et al. Author Correction: Benchmarking Transposable Element Annotation Methods for Creation of a Streamlined, Comprehensive Pipeline. Genome Biol. 2022, 23, 76. [Google Scholar] [CrossRef] [PubMed]
  35. Cantarel, B.L.; Korf, I.; Robb, S.M.C.; Parra, G.; Ross, E.; Moore, B.; Holt, C.; Sánchez Alvarado, A.; Yandell, M. MAKER: An Easy-to-Use Annotation Pipeline Designed for Emerging Model Organism Genomes. Genome Res. 2008, 18, 188–196. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  36. Krzywinski, M.; Schein, J.; Birol, I.; Connors, J.; Gascoyne, R.; Horsman, D.; Jones, S.J.; Marra, M.A. Circos: An Information Aesthetic for Comparative Genomics. Genome Res. 2009, 19, 1639–1645. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  37. Schwartz, S.; Kent, W.J.; Smit, A.; Zhang, Z.; Baertsch, R.; Hardison, R.C.; Haussler, D.; Miller, W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003, 13, 103–107. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Kent, W.J.; Baertsch, R.; Hinrichs, A.; Miller, W.; Haussler, D. Evolution’s Cauldron: Duplication, Deletion, and Rearrangement in the Mouse and Human Genomes. Proc. Natl. Acad. Sci. U. S. A. 2003, 100, 11484–11489. [Google Scholar] [CrossRef] [Green Version]
  39. Harris, R.S. Improved Pairwise Alignment of Genomic DNA; Pennsylvania State University: University Park, PA, USA, 2007. [Google Scholar]
  40. Liao, Y.; Zhang, X.; Chakraborty, M.; Emerson, J.J. Topologically Associating Domains and Their Role in the Evolution of Genome Structure and Function in. Genome Res. 2021, 31, 397–410. [Google Scholar] [CrossRef]
  41. Koch, M.A.; Haubold, B.; Mitchell-Olds, T. Comparative Evolutionary Analysis of Chalcone Synthase and Alcohol Dehydrogenase Loci in Arabidopsis, Arabis, and Related Genera (Brassicaceae). Mol. Biol. Evol. 2000, 17, 1483–1498. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Emms, D.M.; Kelly, S. OrthoFinder: Solving Fundamental Biases in Whole Genome Comparisons Dramatically Improves Orthogroup Inference Accuracy. Genome Biol. 2015, 16, 157. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Stein, J.C.; Yu, Y.; Copetti, D.; Zwickl, D.J.; Zhang, L.; Zhang, C.; Chougule, K.; Gao, D.; Iwata, A.; Goicoechea, J.L.; et al. Publisher Correction: Genomes of 13 Domesticated and Wild Rice Relatives Highlight Genetic Conservation, Turnover and Innovation across the Genus Oryza. Nat. Genet. 2018, 50, 1618. [Google Scholar] [CrossRef] [PubMed]
  44. Liao, Y.; Zhang, X.; Li, B.; Liu, T.; Chen, J.; Bai, Z.; Wang, M.; Shi, J.; Walling, J.G.; Wing, R.A.; et al. Comparison of Oryza Sativa and Oryza Brachyantha Genomes Reveals Selection-Driven Gene Escape from the Centromeric Regions. Plant Cell 2018, 30, 1729–1744. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  45. Fuentes, R.R.; Chebotarov, D.; Duitama, J.; Smith, S.; De la Hoz, J.F.; Mohiyuddin, M.; Wing, R.A.; McNally, K.L.; Tatarinova, T.; Grigoriev, A.; et al. Structural Variants in 3000 Rice Genomes. Genome Res. 2019, 29, 870–880. [Google Scholar] [CrossRef] [PubMed]
  46. Huang, C.R.L.; Schneider, A.M.; Lu, Y.; Niranjan, T.; Shen, P.; Robinson, M.A.; Steranka, J.P.; Valle, D.; Civin, C.I.; Wang, T.; et al. Mobile Interspersed Repeats Are Major Structural Variants in the Human Genome. Cell 2010, 141, 1171–1182. [Google Scholar] [CrossRef] [Green Version]
  47. Audano, P.A.; Sulovari, A.; Graves-Lindsay, T.A.; Cantsilieris, S.; Sorensen, M.; Welch, A.E.; Dougherty, M.L.; Nelson, B.J.; Shah, A.; Dutcher, S.K.; et al. Characterizing the Major Structural Variant Alleles of the Human Genome. Cell 2019, 176, 663–675.e19. [Google Scholar] [CrossRef] [Green Version]
  48. Mahmoud, M.; Gracz-Bernaciak, J.; Żywicki, M.; Karłowski, W.; Twardowski, T.; Tyczewska, A. Identification of Structural Variants in Two Novel Genomes of Maize Inbred Lines Possibly Related to Glyphosate Tolerance. Plants 2020, 9, 523. [Google Scholar] [CrossRef] [Green Version]
  49. Voichek, Y.; Weigel, D. Identifying Genetic Variants Underlying Phenotypic Variation in Plants without Complete Genomes. Nat. Genet. 2020, 52, 534–540. [Google Scholar] [CrossRef]
  50. Cai, X.; Chang, L.; Zhang, T.; Chen, H.; Zhang, L.; Lin, R.; Liang, J.; Wu, J.; Freeling, M.; Wang, X. Impacts of Allopolyploidization and Structural Variation on Intraspecific Diversification in Brassica Rapa. Genome Biol. 2021, 22, 166. [Google Scholar] [CrossRef]
Figure 1. Overview of the assembly pipeline for Brassica rapa var. parachinensis genome. The steps include assembly of PacBio reads followed by scaffolding using Hi-C, and extensive QC using high coverage of Illumina short reads followed by de novo repeat annotation and gene annotation using ISO-seq sequencing.
Figure 1. Overview of the assembly pipeline for Brassica rapa var. parachinensis genome. The steps include assembly of PacBio reads followed by scaffolding using Hi-C, and extensive QC using high coverage of Illumina short reads followed by de novo repeat annotation and gene annotation using ISO-seq sequencing.
Plants 12 02498 g001
Figure 2. A highly continuous genome assembly of the Chinese flowering cabbage (B). rapa var. parachinensis). (A) Hi-C contact map of the Chinese flowering cabbage assembled chromosomes; density of Hi-C contacts are highest at the diagonals, suggesting consistency between the assembly and the Hi-C map; blue squares indicate highly repetitive pericentromeric regions on A05 and A06 chromosomes. (B) Circos diagram of sequence features on the chromosomes of B. rapa var. parachinensis; A01, 02, 03, 04, 05, 06, 07, 08, 09 and 10 indicate the ten assembled chromosomes of B. rapa var. parachinensis. Tracks in the circos plot from outer to inner represent a: Chromosomes; b: Gene; c: DNA-type TE; d: LTR retrotransposons; e: Tandem repeats; f: Synteny.
Figure 2. A highly continuous genome assembly of the Chinese flowering cabbage (B). rapa var. parachinensis). (A) Hi-C contact map of the Chinese flowering cabbage assembled chromosomes; density of Hi-C contacts are highest at the diagonals, suggesting consistency between the assembly and the Hi-C map; blue squares indicate highly repetitive pericentromeric regions on A05 and A06 chromosomes. (B) Circos diagram of sequence features on the chromosomes of B. rapa var. parachinensis; A01, 02, 03, 04, 05, 06, 07, 08, 09 and 10 indicate the ten assembled chromosomes of B. rapa var. parachinensis. Tracks in the circos plot from outer to inner represent a: Chromosomes; b: Gene; c: DNA-type TE; d: LTR retrotransposons; e: Tandem repeats; f: Synteny.
Plants 12 02498 g002
Figure 3. Distribution of genes in B. rapa var. parachinensis and other representative plant species. (A) Distribution of ortholog groups: single copy (blue), two copies (orange), and multiple copies (grey) across 20 eudicot genomes; (B) BUSCO analysis of genome assembly of the 20 eudicot genomes; (C) Venn diagram showing the overlap of gene families among Chinese flowering cabbage and two other assemblies of B.rapa species.
Figure 3. Distribution of genes in B. rapa var. parachinensis and other representative plant species. (A) Distribution of ortholog groups: single copy (blue), two copies (orange), and multiple copies (grey) across 20 eudicot genomes; (B) BUSCO analysis of genome assembly of the 20 eudicot genomes; (C) Venn diagram showing the overlap of gene families among Chinese flowering cabbage and two other assemblies of B.rapa species.
Plants 12 02498 g003
Figure 4. The phylogenetic relationship of B. rapa var. parachinensis with other Brassicaceae plants.
Figure 4. The phylogenetic relationship of B. rapa var. parachinensis with other Brassicaceae plants.
Plants 12 02498 g004
Figure 5. Genome synteny based on orthologous genes within and between species for B. rapa var. parachinensis. (A) Genome synteny between B. rapa var. parachinensis and two other B. rapa genome assemblies (B. rapa Z1 [11] and B. rapa var. pekinensis [16]); (B) Genome synteny between B. rapa var. parachinensis and two highly continuous assemblies of the B. oleracea genome (B. oleracea var. capitata [6,11] and B. oleracea var. italica [11]). Homologous chromosomes are labelled with the same number. “1–10” represent chromosome 1–10, respectively.
Figure 5. Genome synteny based on orthologous genes within and between species for B. rapa var. parachinensis. (A) Genome synteny between B. rapa var. parachinensis and two other B. rapa genome assemblies (B. rapa Z1 [11] and B. rapa var. pekinensis [16]); (B) Genome synteny between B. rapa var. parachinensis and two highly continuous assemblies of the B. oleracea genome (B. oleracea var. capitata [6,11] and B. oleracea var. italica [11]). Homologous chromosomes are labelled with the same number. “1–10” represent chromosome 1–10, respectively.
Plants 12 02498 g005
Figure 6. Comparative analysis of sequence features and synteny at the pericentromeric regions on chromosome 5 and 6 among three Brassica genome types: Chinese flowering cabbage (AA genome), B. nigra (BB genome) and B. oleracea (CC genome). (A) Synteny map of Chr05 between B. nigra (BB genome) and B. rapa var. parachinensis (AA genome); (B) Synteny map of Chr05 between B. oleracea (CC genome) and B. rapa var. parachinensis (AA genome); (C) Synteny map of Chr05 between B. oleracea (CC genome) and B. nigra (BB genome). (D) Synteny map of Chr06 between B. nigra (BB genome) and B. rapa var. parachinensis (AA genome); (E) Synteny map of Chr03 of B. oleracea (CC genome) and Chr06 of B. rapa var. parachinensis (AA genome); (F) Synteny map of Chr03 of B. oleracea (CC genome) and Chr06 of B. nigra (BB genome). Tracks in the circos plot from outer to inner represent a: Chromosomes; b: Gene; c: DNA-type TE; d: LTR retrotransposons; e: Tandem repeats; f: Synteny.
Figure 6. Comparative analysis of sequence features and synteny at the pericentromeric regions on chromosome 5 and 6 among three Brassica genome types: Chinese flowering cabbage (AA genome), B. nigra (BB genome) and B. oleracea (CC genome). (A) Synteny map of Chr05 between B. nigra (BB genome) and B. rapa var. parachinensis (AA genome); (B) Synteny map of Chr05 between B. oleracea (CC genome) and B. rapa var. parachinensis (AA genome); (C) Synteny map of Chr05 between B. oleracea (CC genome) and B. nigra (BB genome). (D) Synteny map of Chr06 between B. nigra (BB genome) and B. rapa var. parachinensis (AA genome); (E) Synteny map of Chr03 of B. oleracea (CC genome) and Chr06 of B. rapa var. parachinensis (AA genome); (F) Synteny map of Chr03 of B. oleracea (CC genome) and Chr06 of B. nigra (BB genome). Tracks in the circos plot from outer to inner represent a: Chromosomes; b: Gene; c: DNA-type TE; d: LTR retrotransposons; e: Tandem repeats; f: Synteny.
Plants 12 02498 g006
Figure 7. Structural variations between two B. rapa lines. (A) Total number of structural variations identified using highly continuous assemblies between Brassica rapa Z1 and Brassica rapa var. parachinensis. TD_Z1, tandem duplications in the Z1 assembly relative to the parachinensis assembly and TD_pare vice versa. Complex SVs indicate their breakpoints are imprecise. (B) Distribution of insertions times of LTR-retrotransposons in three highly continuous Brassica genome assemblies. (C) Examples of tandem duplication impacting genes. (D) Example of medium size genomic inversions between Brassica rapa Z1 and Brassica rapa var. parachinensis, which prevails in Brassica genome evolution.
Figure 7. Structural variations between two B. rapa lines. (A) Total number of structural variations identified using highly continuous assemblies between Brassica rapa Z1 and Brassica rapa var. parachinensis. TD_Z1, tandem duplications in the Z1 assembly relative to the parachinensis assembly and TD_pare vice versa. Complex SVs indicate their breakpoints are imprecise. (B) Distribution of insertions times of LTR-retrotransposons in three highly continuous Brassica genome assemblies. (C) Examples of tandem duplication impacting genes. (D) Example of medium size genomic inversions between Brassica rapa Z1 and Brassica rapa var. parachinensis, which prevails in Brassica genome evolution.
Plants 12 02498 g007
Table 1. Statistics and annotated analysis of the Chinese flowering cabbage genome assembly.
Table 1. Statistics and annotated analysis of the Chinese flowering cabbage genome assembly.
NumberSize Sequence Coverage (X)Percentage (%)
Estimate of genome size 515 Mb
PacBio reads4,448,280113,068 Mb219.31
PacBio reads N50 28,414 b
80X PacBio reads N50 43,902 b
Illumina reads322,016,29242,330 Mb82.10
HiC reads441,545,78666,231 Mb128.46
Total reads 221,630 Mb429.89
Contigs450384 Mb 74.50
N50 of contigs 7.2 Mb
Longest contig 19.9 M
scaffold69384 Mb 74.62
N50 of scaffold 32.2 Mb
Longest scaffold 47.5 Mb
GC content 144.4 Mb 37.61
Total repetitive sequences 170.3 Mb 44.26
Total protein-coding genes47,59847.3 Mb 12.31
Average length per gene 2060 bp
Average exons per gene6.13199 bp
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, G.; Jiang, D.; Wang, J.; Liao, Y.; Zhang, T.; Zhang, H.; Dai, X.; Ren, H.; Chen, C.; Zheng, Y. A High-Continuity Genome Assembly of Chinese Flowering Cabbage (Brassica rapa var. parachinensis) Provides New Insights into Brassica Genome Structure Evolution. Plants 2023, 12, 2498. https://doi.org/10.3390/plants12132498

AMA Style

Li G, Jiang D, Wang J, Liao Y, Zhang T, Zhang H, Dai X, Ren H, Chen C, Zheng Y. A High-Continuity Genome Assembly of Chinese Flowering Cabbage (Brassica rapa var. parachinensis) Provides New Insights into Brassica Genome Structure Evolution. Plants. 2023; 12(13):2498. https://doi.org/10.3390/plants12132498

Chicago/Turabian Style

Li, Guangguang, Ding Jiang, Juntao Wang, Yi Liao, Ting Zhang, Hua Zhang, Xiuchun Dai, Hailong Ren, Changming Chen, and Yansong Zheng. 2023. "A High-Continuity Genome Assembly of Chinese Flowering Cabbage (Brassica rapa var. parachinensis) Provides New Insights into Brassica Genome Structure Evolution" Plants 12, no. 13: 2498. https://doi.org/10.3390/plants12132498

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop