Transcriptome Proﬁling Reveals Candidate Key Genes Involved in Sinigrin Biosynthesis in Brassica nigra

: Glucosinolates (GSLs) are important secondary metabolites in Brassicales related to insect and disease resistance, ﬂavor formation, and human health. Here, we determined the GSL proﬁle with sinigrin as the predominant GSL in Brassica nigra . A total of 184 GSL biosynthetic genes ( BniGSLs ) were identiﬁed in B. nigra by a genome-wide search for orthologs of 82 of the 95 known GSL genes in Arabidopsis thaliana . Transcriptome data demonstrated that at least one BniGSL was highly expressed in stems and leaves at each step of the sinigrin synthesis pathway, which ensured the synthesis of a large amount of sinigrin in B. nigra . Among these key candidates of BniGSLs , the high expression of BniMAM1-2 , BniCYP79F1 , and BniAOP2-1/2 , and the absence of MAM3 and AOP3 , may contribute remarkably to the synthesis and accumulation of sinigrin. In addition, the low expression of some key BniGSLs partially explains the low content of indolic and aromatic GSLs in B. nigra . This study provided a genetic explanation for the formation of the unique GSL proﬁle with sinigrin as the main GSL in B. nigra . The results of this study will be valuable for further functional analysis of BniGSLs and genetic improvement of GSLs in B. nigra and other Brassica species. compared using


Introduction
Glucosinolates (GSLs) are a group of sulfur-rich and nitrogen-containing secondary metabolites that are synthesized from amino acids and sugars in plants. Currently, over 100 different GSLs have been identified [1], most of which are exclusively found in the order Brassicales [2]. GSL is composed of three common moieties, including a β-D-thioglucose group, a sulfonated aldoxime moiety, and a variable side chain derived from a precursor amino acid [3,4]. Upon hydrolysis by the myrosinase enzyme, GSLs are degraded into different bioactive products, mainly isothiocyanates [5]. These broken down products exhibit a variety of biological activities, which not only endow Brassica vegetables with characteristic flavor [6] and help defend against pathogens and insect herbivores [7], but also function in preventing carcinogenesis in animals by stimulating apoptosis and regulating the cell cycle [8]. For instance, mounting studies have shown that the enzymatic hydrolysate of sinigrin has anti-cancer, anti-inflammatory, anti-oxidant, anti-bacterial, antifungal, wound healing properties, and biofumigation applications [9]. The pharmacological and therapeutic properties of GSLs that are beneficial to human health have made the Brassica species attract the considerable interest of many plant breeders and geneticists in the past 30 years [3,10,11].
On the basis of the chemical structure of different side chains, GSLs are classified into aliphatic, indolic, and aromatic GSLs, with methionine (or alanine, valine, leucine,

Plant Materials and Growth Conditions
The black mustard genotype used in this experiment was B. nigra cv. 1511-01 (an inbred line kept in our laboratory). The seeds of the mustard were planted in a controlled climate chamber at 26 • C/20 • C day/night temperature, 14/10 h light/dark photoperiod, 600 µmol·m −2 ·s −1 light intensity, and 60-70% relative humidity. The stems, rosette leaves, cauline leaves, inflorescences, and siliques (about 3 cm in length) used for the GSL profiling were collected during the flowering period. Cauline leaves, stems, and roots were used for RNA sequencing. All the materials were mixed and frozen in liquid nitrogen immediately and stored at −80 • C. All samples were collected from at least three plants, and both glucosinolate analysis and RNA sequencing were carried out in three biological replicates.

GSL Extraction and Analysis
GSL extraction and analysis were performed as previously described with only slight modification [45]. First, 0.25 g sample powder was boiled with 10 mL of 70% methanol after adding 200 µL of 5 mM glucotropaeolin (CAS, 5115-71-9; Code No., A5300,0050; Applichem, Darmstadt, Germany) as internal standard. Then, the supernatant was loaded onto a 1 mL mini-column to desulfate overnight with 200 µL arylsulfatase (Sigma-Aldrich Co., St. Louis, MO, USA). The mini-column containing 250 µL activated DEAE Sephadex A-25 was equilibrated at room temperature for at least 2 h prior to use. Resultant desulfoglucosinolates were eluted with ultrapure water and stored at −20 • C until analysis. Samples were analyzed by high-performance liquid chromatography (HPLC) in an Agilent 1200 HPLC system equipped with a C-18 reversed-phase column (250 × 4 µm, 5 µm, Bischoff, Leonberg, Germany). Elution was performed with ultrapure water (solvent A) and acetonitrile (solvent B) in a linear gradient from 0% to 20% B for 45 min and then constant 20% B for 6 min, followed by 100% A for 5 min prior to the injection of the next sample. The flow rate was 1 mL·min −1 (injection volume of 20 µL). The eluent was monitored by diode array detection at 229 nm. The data of GSL concentrations were analyzed using analysis of variance (ANOVA) software. Mean values were compared using the least significant difference at 0.05 significance level.

In Silico Identification of BniGSL Genes in B. nigra
Based on previous studies [11,42,46], sequences of the GSL biosynthetic genes in A. thaliana, B. rapa, and B. oleracea (Table S1) were acquired from The Arabidopsis Information Resource (TAIR) website (http://www.arabidopsis.org/) (accessed on 10 November 2020), The National Center for Biotechnology Information (NCBI) website (http://www.ncbi.nlm.nih.gov) (accessed on 10 November 2020), and the Brassica database website (http://brassicadb.cn/#/, BRAD) (accessed on 10 November 2020). The whole sequences of each B. nigra chromosome were downloaded from the BRAD database (http://brassicadb.cn/#/Download/) (accessed on 10 November 2020). Using the sequences of GSL genes acquired from the databases mentioned above, BLASTn was performed to search for homologous candidate genes in B. nigra. All candidates in the B. nigra genome, together with flank regions of 5000 bp upstream and downstream of each candidate, were analyzed and re-annotated using FGENESH (http://www.softberry.com/) (accessed on 10 November 2020). By comprehensively considering the candidate's reannotation results, the collinearity relationship with the known GSL genes, the similarity of amino acid sequence, and whether it contained the corresponding key domains or sequences, it is determined whether the candidate is a GSL gene. Given that the GSL genes of A. thaliana have been clearly named, the nomenclature system for the BniGSL genes in this study was based on their homology and identities with their counterparts in Arabidopsis. The resulting BniGSL genes were further used as query sequences to determine the precise locations of each gene on chromosomes through Oligo 6.0 software.

RNA Extraction, Library Construction, Sequencing, and Gene Expression Analysis
Total RNA of leaves, stems, and roots was isolated using Trizol reagent (Invitrogen, Waltham, MA, USA) following the manufacturer's instructions. Nanodrop, Qubit 2.0, and Aglient 2100 were used separately to measure the purity, concentration, integrity, and other values of RNA to ensure the qualified samples for transcriptome sequencing. The mRNA was purified with 20 mg total RNA by using oligo (dT) magnetic beads. After purification, the mRNA was fragmented by adding a fragmentation buffer. The fragments were used to synthesize the first-strand cDNA by using random hexamer adapters and reverse transcriptase (Code No., RR047A; Takara, Japan). The second-strand cDNA was synthesized using buffer, dNTPs, RNase H, and DNA polymerase I. The cDNA fragments that went through an end-repair process were added with a single 'A' base and had the sequencing joints connected. AMPure XP beads were used to select the fragment size for the ligature of adapter sequences. The products were purified, enriched with PCR, and then used as templates for sequencing. Sequencing and assembly were performed by the Biomarker Biotechnology Corporation (Beijing, China) using the Illumina HiSeqTM 2500 platform. The Log2FoldChange and false discovery rate (FDR) value of BniGSL genes between two samples (Root vs Stem, Root vs Leaf, and Stem vs Leaf) were calculated, and the BniGSL genes with |Log2FoldChange| > 1 and FDR < 0.05 were regarded as differentially expressed genes.
All RNA samples used for RNA-Seq were also used for qRT-PCR analysis (Code NO., RR820A; Takara, Japan). All qRT-PCR experiments included three independent biological repetitions. Brassica nigra tonoplastic intrinsic protein-41 (TIPS) gene was used as a reference gene [47]. The 2 −∆∆Ct method was used to calculate the relative gene expression values. The gene-specific primers were listed in Table S2.
The expression data of B. rapa is from Tong et al. [48] (GEO accession number GSE43245). Root, stem, and leaf tissues of B. rapa accession Chiifu-401-42 were collected from seven-week-old plants. The expression data of B. oleracea is from Liu et al. [43] and Yu et al. [49] (GEO accession number GSE 42891).

Analysis of Glucosinolate Profile in B. nigra
HPLC analyses revealed the presence of seven different types of GSLs in five organs of B. nigra cv. 1511-01 (Table 1, Figure S1), including two aliphatic GSLs (i.e., sinigrin and gluconapin), four indolic GSLs (i.e., 4-hydroxy glucobrassicin, glucobrassicin, 4-methoxyglucobrassicin, and 1-methoxyglucobrassicin), and one aromatic GSL (i.e., gluconasturtiin). Results showed that the stems of black mustard contained all the abovementioned GSLs. Rosette leaves contained all GSLs except gluconapin. Cauline leaves and siliques contained all GSs except gluconasturtiin. However, only four GSLs were detected in the inflorescences of black mustard, with three indolic GSLs were absent. In terms of total GSL concentration, black mustard stems possessed the highest concentration of total GSL, followed by inflorescences, cauline leaves and siliques, and rosette leaves contained the lowest total GSL content. Notably, although the GSL profiles in different organs of black mustard were different, sinigrin was the predominant GSL in all five tested organs and accounted for 90.7-98.5% of the total GSL. Moreover, as the main GSL, the content of sinigrin in five organs ranked the same as the total GSL. These results indicated that the GSL profiles in B. nigra are significantly different from those in B. rapa and B. oleracea, suggesting that there may be a major GSL biosynthesis pathway in B. nigra, which directs the synthesis of sinigrin.

Identification and Annotation of BniGSL Genes from B. nigra Genome
We first searched for BniGSL genes in the whole genome of B. nigra before looking for clues as to why black mustard preferred to synthesize sinigrin. Preliminary BLAST searches for BniGSL genes in the whole-genome sequences of B. nigra were performed using GSL genes of A. thaliana, B. rapa, and B. oleracea. Using these pre-screened BniGSL genes, re-annotation, and BLASTP search against known GSL gene sequences of A. thaliana, B. rapa, and B. oleracea resulted in 184 BniGSL genes as orthologs of 82 of the 95 known AtGSL genes, with 13 AtGSL genes having no B. nigra ortholog. The number of BniGSL genes in B. nigra has expanded, with an average of 2.24 copies per gene. It is worth noting that there are thirteen copies of SOT18 in B. nigra, which is far more than the average. All the identified BniGSL genes are listed in Tables 2 and 3, and the sequences of DNA, CDS, and amino acid are listed in Table S3. A total of 124, 48, and 12 BniGSLs encode enzymes involved in glucosinolate biosynthesis (Table 2), TFs with regulatory functions, and transporters involved in glucosinolate transport (Table 3), respectively. More specifically, there are 16, 55, 21, and 32 genes involved in side-chain elongation, core structure synthesis, side-chain modification, and co-substrate pathways, respectively (Table 2). Among the 48 TFs, 33, 8, and 4 TFs act as activators, repressors, and mediators, respectively. HY5, which has both activating and suppressing functions during glucosinolate biosynthesis, consists of three copies in B. nigra (Table 3). In addition, there are 12 BniGSL genes encoding five transporters in black mustard (Table 3).   Of the 184 BniGSL genes, 182 are unevenly mapped among the eight chromosomes of B. nigra, with 14, 28, 36, 21, 28, 20, 14, and 21 BniGSL genes anchoring on chromosome B1-B8, respectively ( Figure 1). Two other BniGSL genes are distributed on a large scaffold, which have not yet been mapped onto chromosomes (Tables 2 and 3). In A. thaliana, there are 25 genes that constitute 11 tandem-duplicate gene modules. Here, we found that there are 20 tandem-duplicate gene modules in B. nigra involving 46 genes. However, the gene families involved in tandem duplication are not exactly the same between these two species (Table 2 and Figure 1).
By doing pairwise sequence alignment of GSL genes between B. nigra and A. thaliana, we found that a total of 13 homologs corresponding to A. thaliana GSL genes were absent in B. nigra, including two genes involved in side-chain elongation (i.e., MAM3 and IPMI-SSU3), a gene related to core structure synthesis (i.e., CYP79F2), seven genes involved in side-chain modification (i.e., FMO GS-OX2/3/4/6/7 , AOP3, and GSL-OH), a gene related to co-substrate pathways (i.e., APR2), and two TFs (i.e., MYB76 and MYB115) ( Table 2). In general, 95.7% of the BniGSL genes identified in this study shared 51-97% amino acid sequence identity with AtGSL genes in A. thaliana, with an average of 78.4%.

BniGTR1-3
BniB05g052360 * means that the orthologous gene in B. nigra has been missing. In the chromosome location, the positive (+) and negative (−) signs indicate the existence of a gene on the positive and negative strand of that specific chromosome, respectively.

Expression Patterns of BniGSL Genes Encoding Enzymes in Three Organs of B. nigra
Transcriptome sequencing was conducted to investigate the expression of BniGSL genes involved in different processes of GSL biosynthesis, and those that participated in aliphatic, indolic, and aromatic GSLs in three organs (i.e., root, stem, and leaf). First, the correlation analysis between biological replicates was performed, and the results showed that the three biological replicates of root, stem, and leaf all have a good correlation ( Figure S2), indicating the high data reliability in this study.
Among 184 BniGSL genes determined in this study, 172 were detected in at least one organ, and 12 genes were not expressed in all three organs. Moreover, there were 34 BniGSL genes whose expression in roots, stems, and leaves all lower than 1 FPKM (Table S4). The low expression of these genes suggests that they may contribute less to GSL biosynthesis in these three organs.
On the basis of the GSL biosynthetic pathways involving BniGSL genes and their biological functions, the expression of 184 BniGSL genes in roots, stems, and leaves was further analyzed. The biosynthesis of aliphatic GSLs can be divided into three main phases, and dozens of enzymes have been determined to participate in the corresponding reactions ( Figure 2). The side-chain elongation phase of aliphatic GSLs consists of six steps and eleven enzymes, of which MAM3 and IPMI SSU3 have been lost in B. nigra. Seven enzymatic steps of the core structure synthesis involved 14 enzymes, with 38 BniGSL genes encoding 13 enzymes in B. nigra (except CYP79F2). The aliphatic GSL side-chain modification phase mainly includes S-oxygenation, side-chain oxygenation, and further conversion of hydroxyalkyl GSL into benzoylated and sinapoylated GSLs, which are catalyzed by five types of enzymes. However, both AOP3 and GSL-OH are absent in B. nigra, and there are only two of the seven Arabidopsis FMOGS-OX are retained in B. nigra. Encouragingly, although there are some aliphatic GSL synthetic BniGSL genes that had an extremely low expression, expression data showed that at least one BniGSL gene was highly expressed in every step of the synthesis of sinigrin. Moreover, the expression of these highly expressed BniGSL genes in leaves and stems was generally higher than that in roots (Figure 2A), and this finding was consistent with the fact that the side-chain elongation of aliphatic GSLs should be performed in green tissues containing chloroplasts.  The synthesis of indolic GSL does not require side-chain elongation and only consists of core structure synthesis and side-chain modification, which can be divided into seven and two steps, respectively. All enzymes involved in indolic GSL synthesis have corresponding homologs in B. nigra, and 33 and 14 BniGSL genes encode 12 and 7 enzymes, respectively, to take part in the above two phases. Remarkably, four BniGSL genes encoding CYP79B2 and CYP79B3, the indolic cytochrome P450 enzymes that convert tryptophan derivatives into aldoximes, and all indolic BniGSL genes responsible for the side-chain modification, showed generally low expression, especially in stems and leaves ( Figure 2B). In addition, 12 BniGSL genes encoding three cytochrome P450 members (i.e., CYP79A2, CYP79C1 and CYP79C2) were involved in the aromatic core structure GSL pathway but were almost not expressed in roots, stems, and leaves ( Figure 2C). This finding might explain the few aromatic GSLs in B. nigra, and their content was extremely low.

Expression Patterns of BniGSL Genes Encoding TFs, Transporters and Proteins Involved in Co-Substrate Pathways
Thus far, a total of 10 TFs of the R2R3 domain MYB family have been characterized as key players in the regulation of GSL genes. MYB28, MYB29, and MYB76 play critical regulatory roles in aliphatic GSL biosynthesis. MYB34, MYB51, and MYB122 are essential regulators of indolic GSL biosynthesis. MYB115 and MYB118 are responsible for modulating the synthesis of aromatic GSLs. Similar to the enzymes involved in GSL biosynthesis, multiple aliphatic GSL-related BniMYBs showed high expression in stems and leaves, whereas all indolic and aromatic GSL-related BniMYBs had low expressions ( Figure 3A). As expected, MYC2, MYC3, and MYC4, the common regulators of aliphatic and indolic GSL synthesis, were highly expressed in all three organs of B. nigra ( Figure 3A), and had similar expression patterns of some BniGSL genes (i.e., BniGGP1-1/2, BniSUR1-1, BniUGT74B1, BniSOT16-2, and BniSOT18-1) involved in the core structure synthesis of aliphatic and indolic GSLs (Figure 2). Among other TFs that regulated GSL biosynthesis, except IQD1 and FRS 7, at least one copy of a BniGSL gene was expressed in at least one organ ( Figure 3B).
Five proteins have been experimentally characterized as transporters in GSL biosynthesis. BAT5s, which acts as chloroplast transporter in the side-chain elongation phase, exhibited a similar expression pattern to the GSL genes related to aliphatic GSL biosynthesis in B. nigra. SULTR1;1 and SULTR1;2 are two sulfate transporters that function in Arabidopsis roots, and their orthologs in B. nigra also showed a relatively high expression in roots. GTR1 and GTR2 are responsible for transporting synthesized GSLs from leaves to seeds and roots [50,51]. Expression data also showed that BniGTR1s and BniGTR2s were predominantly expressed in roots ( Figure 3C).
As expected, MYC2, MYC3, and MYC4, the common regulators of aliphatic and indolic GSL synthesis, were highly expressed in all three organs of B. nigra ( Figure 3A), and had similar expression patterns of some BniGSL genes (i.e., BniGGP1-1/2, BniSUR1-1, BniUGT74B1, BniSOT16-2, and BniSOT18-1) involved in the core structure synthesis of aliphatic and indolic GSLs (Figure 2). Among other TFs that regulated GSL biosynthesis, except IQD1 and FRS 7, at least one copy of a BniGSL gene was expressed in at least one organ ( Figure 3B). Five proteins have been experimentally characterized as transporters in GSL biosynthesis. BAT5s, which acts as chloroplast transporter in the side-chain elongation phase, The gene expression analysis showed that the expression patterns of the BniGSL genes encoding proteins involved in the co-substrate pathway were also different. For BniGSL genes that function in the sulfur assimilation process, except BniAPK2-3, BniAPR1-2, and BniGSH1-4, the remaining 17 genes were highly expressed in all three organs ( Figure 4A), which were similar to the expression patterns of some core structure synthesis related BniGSL genes that play roles in aliphatic and indolic GSL synthesis ( Figure 2). However, BniGSL genes involved in the synthesis of tryptophan and phenylalanine metabolism were all generally low-expressed in the three organs ( Figure 4B,C), which indicated that the supplies of tryptophan for the synthesis of indolic GSL and benzoyl-coenzyme A (BzCoA) for the synthesis of benzoylated GSLs (BzGSLs) might not be sufficient. The low expression characteristics of these genes may be one of the reasons for the low indolic GSL content in B. nigra and the absence of BzGSLs and sinapoylated GSLs (SnGSLs). tabolism were all generally low-expressed in the three organs ( Figure 4B,C), which indicated that the supplies of tryptophan for the synthesis of indolic GSL and benzoyl-coenzyme A (BzCoA) for the synthesis of benzoylated GSLs (BzGSLs) might not be sufficient. The low expression characteristics of these genes may be one of the reasons for the low indolic GSL content in B. nigra and the absence of BzGSLs and sinapoylated GSLs (SnGSLs).

Expression Patterns of Candidate Key Genes Involved in the Synthesis of Aliphatic GSLs by qRT-PCR
qRT-PCR analysis was performed to reconfirm the expression patterns of 15 candidate key genes (including 12 structural genes and three TFs) involved in the biosynthesis of aliphatic GSLs. As shown in Figure 5, the results of qRT-PCR indicated that the expression levels of nine genes (i.e., BniBCAT4-1, BniMAM1-2, BniIMD1, BniCYP79F1, Bni-CYP83A1-2, BniUGT74C1-1, BniFMOGS-OX1, BniAOP2-1, and BniAOP2-2) out of 12 structural genes in stems and leaves were five times or more than that in roots. The expression levels of the other three structural genes in three organs did not differ by more than 2.5 times. Nevertheless, in addition to BniSOT17, the expression levels of BniIIL1-1/2 and BniBCAT3-1 in stems and leaves were still higher than those in roots. For TFs, notably, as

Expression Patterns of Candidate Key Genes Involved in the Synthesis of Aliphatic GSLs by qRT-PCR
qRT-PCR analysis was performed to reconfirm the expression patterns of 15 candidate key genes (including 12 structural genes and three TFs) involved in the biosynthesis of aliphatic GSLs. As shown in Figure 5, the results of qRT-PCR indicated that the expression levels of nine genes (i.e., BniBCAT4-1, BniMAM1-2, BniIMD1, BniCYP79F1, BniCYP83A1-2, BniUGT74C1-1, BniFMO GS-OX1 , BniAOP2-1, and BniAOP2-2) out of 12 structural genes in stems and leaves were five times or more than that in roots. The expression levels of the other three structural genes in three organs did not differ by more than 2.5 times. Nevertheless, in addition to BniSOT17, the expression levels of BniIIL1-1/2 and BniBCAT3-1 in stems and leaves were still higher than those in roots. For TFs, notably, as an orthologous gene of MYB28, which is the main regulatory gene of aliphatic GSL synthesis, BniMYB28-2 was extremely highly expressed in stems and leaves. The expression of BniMYC3-1 in stems and leaves was also higher than that in roots. In contrast, the expression of BniMYB29-1 in three organs was not much different.
In short, the results of qRT-PCR were consistent with the results of RNA-Seq (Figures 3 and 4, Table S4), which together indicated that most of the candidate key genes involved in the synthesis of aliphatic GSLs in B. nigra were mainly highly expressed in stems and leaves, while relatively low in roots.
of BniMYC3-1 in stems and leaves was also higher than that in roots. In contrast, the expression of BniMYB29-1 in three organs was not much different.
In short, the results of qRT-PCR were consistent with the results of RNA-Seq ( Figures  3 and 4, Table S4), which together indicated that most of the candidate key genes involved in the synthesis of aliphatic GSLs in B. nigra were mainly highly expressed in stems and leaves, while relatively low in roots.

Key MAM Genes Controlling Side-Chain Elongation in B. nigra
The aliphatic GSL biosynthesis initiated by methionine first needs to undergo a six-step side-chain elongation involving five types of enzymes. Among these enzymes, the MAM family remarkably contributes to the diversity of synthesized GSLs because different MAM members have different preferences for catalyzing the side-chain elongation process. In Arabidopsis, three tandemly duplicated MAM genes were identified, named MAM1, MAM2 (absent in ecotype Columbia), and MAM3. The functional analysis revealed that MAM2 and MAM1 were correlated with the accumulation of 3 and 4 carbon (C) side-chain GSLs that had undergone the first and first two rounds of chain elongation, respectively [52,53], whereas MAM3 was able to catalyze the condensations in the first six elongation cycles [13].
To clarify the evolutionary relationships among MAM homologs, we performed a detailed phylogenetic analysis of predicted amino acid sequences of MAM members from Arabidopsis and three basic diploid species of Brassica. On the basis of syntenic and sequence similarity analysis, we identified nine, nine, and five MAM members from B. rapa, B. oleracea, and B. nigra, respectively. The resulting phylogenetic tree indicated that four, five, and two MAM members of the three basic diploid species of Brassica were phylogenetically close to AtMAM1/2, and three, two, and two MAM members seemed to be closely related to AtMAM3. In addition, there were two BraMAM, two BolMAM, and a BniMAM that were phylogenetically distant from AtMAM1/2 and AtMAM3 ( Figure 6A).

BniCYP79F1 Was Extremely Highly Expressed in B. nigra
The substrate-specific cytochrome P450s of the CYP79 family act as the entry point in GSL core structure synthesis by catalyzing the conversion of amino acid derivatives into the corresponding aldoximes. In Arabidopsis, seven CYP79s have been functionally characterized, and different members showed different preferences for amino acid-derived substrates. To reveal the possible connection between the expression of CYP79s and the GSL profiles, CYP79s involved in GSL synthesis in B. rapa, B. oleracea, and B. nigra were further analyzed (Figure 7). CYP79F1 and CYP79F2 are responsible for the biosynthesis of methionine-derived GSLs, and CYP79F2 only converts long-chained methionine derivatives into aldoximes [54][55][56]. Results showed that although there is only one ortholog of CYP79F1 in each Brassica species, they were all highly expressed in at least one organ. The only difference was that Bra026058 was predominantly expressed in roots, Bol038222 was mainly expressed in stems, followed by leaves, while BniCYP79F1 was extremely highly expressed in stems and leaves, while slightly lower in roots ( Figure 7B-D). Surprisingly, there is no ortholog of CYP79F2 in Brassica ( Figure 7A). This fact may also connect to the low content of long-chained aliphatic GSLs in the GSL profiles of Brassica. On the basis of expression data of the BniGSL genes obtained by RNA sequencing in this study, as well as the reported expression information of GSL genes in B. rapa and B. oleracea [48,49], we found that only four, two, and four MAM genes were expressed in B. rapa, B. oleracea, and B. nigra, respectively. Interestingly, a group of genes comprising Bra013007, Bol017070, and BniMAM1-2, was the only group of orthologs that showed high expression in B. oleracea and B. nigra but silenced in B. rapa ( Figure 6B-D). This expression difference most likely explains why 3C and/or 4C GSLs can be synthesized and accumulated in large quantities in B. nigra (e.g., sinigrin) and B. oleracea (e.g., progoitrin, gluconapin, glucoraphanin, and sinigrin), but not in B. rapa. In addition, Bol020647 and BniMAM1-1 may also contribute to the synthesis of 3C GSLs ( Figure 6C,D). Meanwhile, it is likely the loss of MAM3 leads to the low contents of 4C, 5C and long-chained aliphatic GSLs in B. nigra ( Figure 6D).

BniCYP79F1 Was Extremely Highly Expressed in B. nigra
The substrate-specific cytochrome P450s of the CYP79 family act as the entry point in GSL core structure synthesis by catalyzing the conversion of amino acid derivatives into the corresponding aldoximes. In Arabidopsis, seven CYP79s have been functionally characterized, and different members showed different preferences for amino acid-derived substrates. To reveal the possible connection between the expression of CYP79s and the GSL profiles, CYP79s involved in GSL synthesis in B. rapa, B. oleracea, and B. nigra were further analyzed (Figure 7). CYP79F1 and CYP79F2 are responsible for the biosynthesis of methionine-derived GSLs, and CYP79F2 only converts long-chained methionine derivatives into aldoximes [54][55][56]. Results showed that although there is only one ortholog of CYP79F1 in each Brassica species, they were all highly expressed in at least one organ. The only difference was that Bra026058 was predominantly expressed in roots, Bol038222 was mainly expressed in stems, followed by leaves, while BniCYP79F1 was extremely highly expressed in stems and leaves, while slightly lower in roots ( Figure 7B-D). Surprisingly, there is no ortholog of CYP79F2 in Brassica ( Figure 7A). This fact may also connect to the low content of long-chained aliphatic GSLs in the GSL profiles of Brassica.
CYP79B2 and CYP79B3 take part in indolic GSL synthesis [57][58][59]. Genomic analysis showed that CYP79B2 retained three copies in each of the three Brassica species, while CYP79B3 retained only one copy (Figure 7). Moreover, these 12 CYP79s have been detected to be expressed in at least one organ. However, the expression patterns of CYP79B2 and CYP79B3 in the three species were different. In general, CYP79B2 and CYP79B3 were highly expressed in all three organs of B. rapa, while those in B. oleracea were predominantly expressed in leaves, followed by in roots. In B. nigra, their expression can be detected in roots but lower than those in stems and leaves ( Figure 7B-D). These expression features may partly explain that the content of indolic GSL in B. nigra is lower than those in B. rapa and B. oleracea. In addition, homologous genes of CYP79A2, CYP79C1, and CYP79C2 in B. nigra are more than those in B. rapa and B. oleracea. However, except for one copy of CYP79A2 in B. oleracea, the expression of all homologs of these three genes in all three species was extremely low.

The Difference in AOP2 Genes Greatly Affect the Diversity of GSLs in Brassica
Side-chain modification is another pathway to enrich aliphatic GSL species in addition to the side-chain elongation. The GS-AOP locus is responsible for side-chain oxygenation and contains three genes encoding 2-oxoglutarate-dependent dioxygenases in Arabidopsis, namely AOP1, AOP2, and AOP3 [60]. AOP2 and AOP3 are located within the GSL-ALK and GSL-OHP loci, respectively, and can convert methylsulfinylalkyl GSL into alkenyl and hydroxyalkyl GSL, respectively [23,61]. Although the functionality of AOP1 is unknown, it is considered to be the ancestral gene of AOP2 and AOP3 by gene duplication events, suggesting that AOP1 may also function in GSL biosynthesis [60].
AOP genes in B. rapa, B. oleracea, and B. nigra were identified and subjected to expression analysis to assess the contribution of AOP genes on aliphatic GSL diversity in B. nigra, as well as the difference in AOP genes in different Brassica species. Results showed that B. rapa, B. oleracea, and B. nigra contained three, two, and two AOP1 genes, respectively; three, one, and two functional AOP2 genes, respectively; and no AOP3 homolog ( Figure 8A). The absence of AOP3 in these three Brassica species may be the key reason why they rarely contain hydroxyalkyl GSLs. The AOP1 genes in Brassica were mainly expressed in roots ( Figure 8B-D), suggesting that they may function in the side-chain modification of aliphatic GSLs in roots. Most notably, significant differences were observed in the gene function and expression of AOP2 genes in Brassica. Although three AOP2 copies in B. oleracea were identified, the presence of a premature stop codon resulted in two of them being nonfunctional [43], while the other functional BolAOP2 had no expression data ( Figure 8C), which might be caused by low expression. In contrast, all three AOP2 genes in B. rapa were functional, and Bra00848 was highly expressed in both stems and leaves ( Figure 8B). Similarly, two copies of AOP2 in B. nigra, i.e., BniAOP2-1 and BniAOP2-1, were also extremely highly expressed in stems and leaves ( Figure 8D). The difference in AOP2 genes in Brassica supports their unique GSL profiles and partially explains why B. oleracea is rich in glucoraphanin, but not in B. rapa, and why sinigrin is abundant in B. nigra.

The Difference in AOP2 Genes Greatly Affect the Diversity of GSLs in Brassica
Side-chain modification is another pathway to enrich aliphatic GSL species in addition to the side-chain elongation. The GS-AOP locus is responsible for side-chain oxygenation and contains three genes encoding 2-oxoglutarate-dependent dioxygenases in Arabidopsis, namely AOP1, AOP2, and AOP3 [60]. AOP2 and AOP3 are located within the GSL-ALK and GSL-OHP loci, respectively, and can convert methylsulfinylalkyl GSL into alkenyl and hydroxyalkyl GSL, respectively [23,61]. Although the functionality of AOP1 is unknown, it is considered to be the ancestral gene of AOP2 and AOP3 by gene duplication events, suggesting that AOP1 may also function in GSL biosynthesis [60].
AOP genes in B. rapa, B. oleracea, and B. nigra were identified and subjected to expression analysis to assess the contribution of AOP genes on aliphatic GSL diversity in B. nigra, as well as the difference in AOP genes in different Brassica species. Results showed that B. rapa, B. oleracea, and B. nigra contained three, two, and two AOP1 genes, respectively; three, one, and two functional AOP2 genes, respectively; and no AOP3 homolog ( Figure  8A). The absence of AOP3 in these three Brassica species may be the key reason why they rarely contain hydroxyalkyl GSLs. The AOP1 genes in Brassica were mainly expressed in roots ( Figure 8B-D), suggesting that they may function in the side-chain modification of aliphatic GSLs in roots. Most notably, significant differences were observed in the gene function and expression of AOP2 genes in Brassica. Although three AOP2 copies in B. oleracea were identified, the presence of a premature stop codon resulted in two of them being nonfunctional [43], while the other functional BolAOP2 had no expression data (Figure 8C), which might be caused by low expression. In contrast, all three AOP2 genes in B. rapa were functional, and Bra00848 was highly expressed in both stems and leaves ( Figure  8B). Similarly, two copies of AOP2 in B. nigra, i.e., BniAOP2-1 and BniAOP2-1, were also extremely highly expressed in stems and leaves ( Figure 8D). The difference in AOP2 genes in Brassica supports their unique GSL profiles and partially explains why B. oleracea is rich in glucoraphanin, but not in B. rapa, and why sinigrin is abundant in B. nigra.

Discussion
Despite being one of the three ancestral Brassica species in U's triangle model, studies on B. nigra always lag behind that on B. rapa and B. oleracea, including the study on the identification of GSL biosynthesis genes at the genome-wide level. Although the absolute content of various GSLs in Brassica can be strongly influenced by environmental factors, the patterns of GSL are mainly controlled genetically [62,63]. Moreover, it was reported that genetic factors were dominant in controlling the synthesis of aliphatic GSLs [64]. Therefore, the genome-wide and expression analyses of GSL genes can help delineate the dominant GSL synthesis pathway. Here, through these analyses, we identified the genes involved in the biosynthesis of GSLs in B. nigra, and proposed a sinigrin biosynthesis pathway involving multiple candidate key genes in B. nigra (Table S3, Figure 9).

Discussion
Despite being one of the three ancestral Brassica species in U's triangle model, studies on B. nigra always lag behind that on B. rapa and B. oleracea, including the study on the identification of GSL biosynthesis genes at the genome-wide level. Although the absolute content of various GSLs in Brassica can be strongly influenced by environmental factors, the patterns of GSL are mainly controlled genetically [62,63]. Moreover, it was reported that genetic factors were dominant in controlling the synthesis of aliphatic GSLs [64]. Therefore, the genome-wide and expression analyses of GSL genes can help delineate the dominant GSL synthesis pathway. Here, through these analyses, we identified the genes involved in the biosynthesis of GSLs in B. nigra, and proposed a sinigrin biosynthesis pathway involving multiple candidate key genes in B. nigra (Table S3, Figure 9). In this study, the GSL content survey once again confirmed that sinigrin was the most dominant GSL in B. nigra (Table 1), accounting for more than 90% of the total GSLs. In order to explore why B. nigra mainly synthesized and accumulated sinigrin from the genetic perspective. We searched out all GSL genes as much as possible on the basis of the latest whole-genome data of B. nigra. BniGSL genes were identified by comparing their In this study, the GSL content survey once again confirmed that sinigrin was the most dominant GSL in B. nigra (Table 1), accounting for more than 90% of the total GSLs. In order to explore why B. nigra mainly synthesized and accumulated sinigrin from the genetic perspective. We searched out all GSL genes as much as possible on the basis of the latest whole-genome data of B. nigra. BniGSL genes were identified by comparing their related homologs in A. thaliana, B. rapa, and B. oleracea. A total of 184 BniGSL genes were identified, of which 182 could be labeled on eight chromosomes of B. nigra (Tables 2 and 3, Figure 1). Compared with the identified GSL genes in B. rapa and B. oleracea, more members of BniGSL genes were determined in this study, which might be due to the incomplete genome data of B. rapa and B. oleracea previously [42,43,65]. Nevertheless, the GSL genes in these three Brassica species were all amplified considerably in their process of evolution. Six modes of gene duplication have been demonstrated in previous research, including wholegenome duplication (WGD) and tandem duplication (TD). The Brassica genome is proven to have triplicated soon after its divergence from Arabidopsis [44,66,67]. A functional bias is observed in genes retained after WGD and TD, which may show positive or negative correlations in the expansion of different members in a certain gene family. Duplicated genes may enhance the potential for the quantitative variation of a particular trait [68][69][70]. Similar to the GSL genes in B. rapa and B. oleracea, WGD and TD are the main mechanisms accounting for the expansion of BniGS genes, and most of them are present in multiple copies ( Figure 1, Tables 2 and 3). For instance, previous research showed that SOT18 is a multigene subfamily with tandem arrays of genes in B. rapa (10 members) and R. sativus (11 members) [42,71]. Thirteen BniSOT18s were identified in the current B. nigra genome and there were four groups of members expanded through TD ( Figure 1, Table 2), indicating that TD was another factor responsible for the expansion of GSL genes during the evolution of B. nigra genome similar to those of B. rapa and R. sativus.
In general, there is no substantial difference among the inventory of GSL genes in B. nigra and those in B. rapa and B. oleracea, since most GSL genes have successfully retained at least one copy during the evolution of the Brassica, despite the difference in copy number of some GSL genes (Tables 2 and 3). Moreover, the absence of some GSL genes exists in all three Brassica species. For example, no ortholog of CYP79F2 and AOP3 has been identified in Brassica [42,43,46], which basically explains the lack of long-chain aliphatic and hydroxyalkyl GSLs [36,38,72]. However, there are significant differences in the GSL profiles of the three Brassica species. For example, sinigrin is the main GSL in B. nigra, and the sinigrin content in B. nigra is much higher than those in B. rapa and B. oleracea.
By transcriptome sequencing, we believe that the specific expression patterns of BniGSL genes in B. nigra largely determine its unique GSL profile. In the aliphatic GSL biosynthesis pathway, except the oxidative decarboxylation that involves only one gene (i.e., BniIMD1), each step (until the alkenylation of basic GSL) involves two or more genes, and at least one gene is highly expressed in stems and leaves (Figure 2A), ensuring the synthesis of a large amount of aliphatic GSL. Furthermore, the loss of MAM3 orthologs results in failure to synthesize aliphatic GSL with long side chain, while the absence of AOP3 and GSL-OH prevented the hydroxyalkylation of the basic GSL and the conversion of alkenyl GSL into hydroxylated alkenyl GSL. Therefore, we speculated that the expression characteristics of the aliphatic BniGSL genes and the above-mentioned limitations are possibly the main reason for the synthesis of a large amount of sinigrin in B. nigra. Specifically, we concluded that the candidate key genes involved in the sinigrin synthesis pathway are BniBCAT4-1/2, BniBAT5-1/2, BniMAM1-2, BniIPMI SSU1-1/BniIPMI SSU2/BniIIL1-1/2, BniIMD1, BniBCAT3-1, BniCYP79F1, BniCYP83A1-2, BniGSTF11/BniGSTU20, BniGGP1-1/2, BniSUR1-1, BniUGT74B1/BniUGT74C1-1, BniSOT16-2/BniSOT17/BniSOT18-1, BniFMO CG-OX1 , and BniAOP2-1/2 ( Figure 9). Moreover, the results of expression analysis of MAMs, CYP79s, and AOPs in three Brassica species further indicate that the differences in gene expression have significant effects on GSL patterns and contents, and BniMAM1-2, BniCYP79F1, and BniAOP2-1/2 may control the key nodes of the sinigrin synthesis pathway in B. nigra. In addition, the low expression of some key indolic and aromatic BniGSL genes in B. nigra is partially responsible for low indolic and aromatic GSL, respectively. These findings enriched our understanding of GSL biosynthesis patterns in B. nigra and provided guidance for changing the GSL profile in B. nigra, e.g., by regulating the expression of key BniGSL genes (e.g., overexpression/knockout) to change the GSL synthesis pathway in B. nigra, thereby constructing new varieties with different GSL profiles (e.g., high/low sinigrin). Moreover, the identification of GSL biosynthesis pathways in B. nigra also provided a reference for the regulation of GSL synthesis in other Brassica species.
In this study, we determined the GSL profile of B. nigra, made an inventory of BniGSL genes and characterized their expression patterns. This research contributes to the functional analysis of BniGSL genes and improvement of GSLs in B. nigra, and provides a new perspective for future research on GSL synthesis in other species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/horticulturae7070173/s1; Figure S1: HPLC chromatograms of glucosinolates isolated from different organs of Brassica nigra. Figure S2: Correlation analysis of samples. Table S1: GSL genes reported in Brassica rapa and Brassica oleracea. Table S2: List of primers used for qRT-PCR. Table S3: The DNA, CDS and amino acid sequences of glucosinolate biosynthetic related genes (BniGSLs) in Brassica nigra. Table S4