Metabolite Profiling and Transcriptome Analysis Provide Insight into Seed Coat Color in Brassica juncea

The allotetraploid species Brassica juncea (mustard) is grown worldwide as oilseed and vegetable crops; the yellow seed-color trait is particularly important for oilseed crops. Here, to examine the factors affecting seed coat color, we performed a metabolic and transcriptomic analysis of yellow- and dark-seeded B. juncea seeds. In this study, we identified 236 compounds, including 31 phenolic acids, 47 flavonoids, 17 glucosinolates, 38 lipids, 69 other hydroxycinnamic acid compounds, and 34 novel unknown compounds. Of these, 36 compounds (especially epicatechin and its derivatives) accumulated significantly different levels during the development of yellow- and dark-seeded B. juncea. In addition, the transcript levels of BjuDFR, BjuANS,BjuBAN, BjuTT8, and BjuTT19 were closely associated with changes to epicatechin and its derivatives during seed development, implicating this pathway in the seed coat color determinant in B. juncea. Furthermore, we found numerous variations of sequences in the TT8A genes that may be associated with the stability of seed coat color in B. rapa, B. napus, and B. juncea, which might have undergone functional differentiation during polyploidization in the Brassica species. The results provide valuable information for understanding the accumulation of metabolites in the seed coat color of B. juncea and lay a foundation for exploring the underlying mechanism.


Introduction
Brassica juncea L. (AABB, n = 18), also called mustard, is an important crop that was formed by means of the hybridization of the ancestors of the diploid species Brassica rapa (AA, n = 10) and Brassica nigra (BB, n = 8) [1]. B. juncea has a long history of cultivation and is cultivated worldwide in countries such as China, India, Canada, and Europe. Different varieties and cultivars have been classified for root-use, stem-use, leaf-use and oil-use based on different edible organs [2][3][4]. In Brassica oilseed crops, the yellow-seeded varieties are preferred over the dark-seeded varieties due to their thin seed coat and high oil content [5,6]. Among the oilseed Brassica species, the naturally yellow-seeded germplasm recourses usually exist in the diploid B. rapa (Yellow Sarson type) and allotetraploid B. juncea varieties [7,8] but are not found in Brassica oleracea and Brassica napus. Furthermore,

Differential Metabolic Profiling Analysis Based on OPLS-DA
All of the peak areas of the discernible chromatograms were calculated using Xcalibur 3.1 software. The PCA score plot revealed a clear separation between the developing seeds from the yellow-and dark-seeded B. juncea, which was indicated by PCA1 (32%) and PCA2 (21%) (Supplementary Figure S1a), suggesting a significantly different metabolic profile among them, with a dynamic change accumulation during seed development. Meanwhile, the biological replicates of the same sample types showed a good stability and high reliability through the UPLC-HESI-MS/MS detection (Supplementary Figure S1a).
To identify metabolites that were differentially abundant between the yellow-and dark-seeded B. juncea, we performed differential analysis on the samples from three developmental stages based on OPLS-DA. The OPLS-DA model compared the metabolite content of the samples in pairs to evaluate the differences the between yellow-and darkseeded B. juncea at 20 DAP (R2X = 0.488, R2Y = 0.979, Q2 = 0.915), 30 DAP (R2X = 0.453, R2Y = 0.989, Q2 = 0.958) and 40 DAP (R2X = 0.408, R2Y = 0.996, Q2 = 0.977). All Q2 values of the model were close to 1, and all blue Q2 values to the left are lower than the original points to the right (Supplementary Figure S2a-c), indicating that these models could be used to further screen for differentially abundant metabolites.
We conducted a subsequent analysis of the differential metabolites that accumulated in the yellow-versus the dark-seeded B. juncea, with a VIP value ≥ 1, p value < 0.05, and a fold change value ≥ 1.6 or ≤ 0.625 for each group (Supplementary Table S2 Table  S3). As expected, there were 17 different metabolites in the flavonoid group, which is more than any other group (Table 1) 1 (1926.20-fold), and [DP4] (7.948-fold) were produced in greater concentrations in the dark-seeded variety than in the yellowseeded variety, respectively (Table 1, Supplementary Table S4). In addition, the eriodictyol contents (615.06-fold) had higher accumulation levels in the dark-seeded variety than in yellow-seeded variety. Our results support that epicatechin and PAs are important factors affecting seed coat formation in B. juncea. mode. A total of 13,574 base peak chromatograms identified by MS-DIAL ver software were produced from seeds at different developmental stages in yel dark-seeded B. juncea (Figure 1b-d). Among them, 1440 could be identified and by MS-DIAL ver. 4.18 software with a mass bank negative MS/MS database. We 236 discernible base peak chromatograms according to their retention times (RT)  MS, and MS/MS spectral data together with the available key standards and p  reported information in this study, including 31 phenolic acids, 47 flavonoids, 1  inolates, 38 lipid compounds, 69 other hydroxycinnamic acid compounds, an  known compounds (Figure 1e-g, Supplementary Table S1). Both the total base p matograms and the base peak intensity of the extracted compounds differed be dark mustard seeds and the yellow mustard seeds (Figure 1b-g).   -1500) at 20, 30, and 40 days after pollination (DAP), respectively. The black and green lines indicate darkand yellow-seeded B. juncea, respectively. (e-g) All of the detected compounds were generated from seeds at 20, 30, and 40 DAP, respectively. The green and black dots represent compounds from yellow-and dark-seeded B. juncea, respectively. The horizontal and vertical coordinates represent the retention times (RTs) and peak areas of the corresponding compounds, respectively. juncea. (a-c) Volcano plots of differentially abundant metabolites. Each point in a volcano plot represents a metabolite; the abscissa represents the logarithm of the quantitative difference multiples of a metabolite in two samples, and the ordinate represents the variable importance in project (VIP) value. The green dots represent down-regulated differentially abundant metabolites, the red dots represent up-regulated differentially abundant metabolites, and the black dots represent metabolites with no difference. (d) Venn diagram of differentially abundant metabolites in developing seeds of B. juncea. DAP = days after pollination. (a-c) Volcano plots of differentially abundant metabolites. Each point in a volcano plot represents a metabolite; the abscissa represents the logarithm of the quantitative difference multiples of a metabolite in two samples, and the ordinate represents the variable importance in project (VIP) value. The green dots represent down-regulated differentially abundant metabolites, the red dots represent up-regulated differentially abundant metabolites, and the black dots represent metabolites with no difference. (d) Venn diagram of differentially abundant metabolites in developing seeds of B. juncea. DAP = days after pollination.

Metabolite Accumulation Patterns
To investigate the dynamic accumulation of the differential metabolites, these compounds were quantified from the calibration curve of corresponding or similar standard compounds. In this study, 92 metabolites could be relatively quantified using 9 commercial standards in B. juncea (Supplementary Table S1). Furthermore, we subjected the remaining 78 metabolites to PCA analysis after omitting the trace contents (<0.01 µg/g FW), which included 22 phenolic acids, 39 flavonoids, and 17 glucosinolates. The PCA results were similar to those obtained by peak-surface analysis (Supplementary Figure S1b) and revealed clear separation between the dark-and yellow-seeded samples as well as the samples from different developmental stages (Supplementary Figure S1b, Supplementary  Table S4). In terms of seed development, the contents of glucosinolates and phenolic acids tended to accumulate in yellow-and dark-seeded B. juncea (Figure 3a,c), while the total flavonoid contents were significantly different (p < 0.01) between the yellow-and dark-seeded B. juncea during seed development (Figure 3b). These results indicate that flavonoids might be involved in the determination of seed coat color in B. juncea.
To determine whether the dark seed coat color is caused by flavonoids, we analyzed 16 soluble flavonoids that were differentially abundant between yellow-and dark-seeded B. juncea seeds throughout development ( Figure 4). Among them, Is-3-O-glucoside-7-Oglucoside (<0.01 µg/g FW) and oenin (0.02 µg/g FW) were omitted because they were present in trace amounts. These common differentially abundant metabolites showed various accumulation patterns during seed development in yellow-and dark-seeded B. juncea, and their levels were higher in the dark variety seeds than in the yellow B. juncea seeds throughout development (Figure 3d Table S4). In terms of seed development, the contents of glucosinolates and phenolic acids tended to accumulate in yellow-and dark-seeded B. juncea (Figure 3a,c), while the total flavonoid contents were significantly different (p < 0.01) between the yellow-and darkseeded B. juncea during seed development (Figure 3b). These results indicate that flavonoids might be involved in the determination of seed coat color in B. juncea.  , and glucosinolates (c) in yellow-and dark-seeded B. juncea. The horizontal and vertical coordinates represent the content of the constituent and the developmental stage, respectively. Error bars represent the SDs of the replications among the biological samples in this study. Statistical significance was calculated using Student's t-test: *, p < 0.05; **, p < 0.01, respectively. (d) The accumulation patterns of flavonoid compounds in yellow-and dark-seeded B. juncea. Red indicates high abundance; blue indicates relatively low metabolite abundance. The bar represents the log 2 contents (µg/g FW). DAP, days after pollination. The red asterisk indicates significantly different metabolites at three stages. DP2, procyanidin B; DP3, procyanidin C; DP4, procyanidin D. and their levels were higher in the dark variety seeds than in the yellow B. juncea seeds throughout development (Figure 3d). For example, the larger amount of components, including epicatechin (139.975 µg/g FW), [DP2]-1 (36.52 µg/g FW), [DP2]-2 (10.495 µg/g FW), and [DP3]-1 (7.893 µg/g FW), were abundant in the dark-seeded variety at 30 DAP (Supplementary Table S4, Supplementary Figure S3). Our results indicate that the differentially abundant epicatechin and PAs might be associated with the seed coat coloration of B. juncea.   Supplementary Table S1. DAP, days after pollination. DP2, procyanidin B; DP3, procyanidin C; DP4, procyanidin D.

Transcriptome Analysis of Flavonoid Biosynthesis Genes in B. juncea
Flavonoids are responsible for the coloration of fruits, flowers, and seeds, and some genes regulating flavonoid biosynthesis have been found in Brassica species [35,55]. Therefore, we performed RNA-seq analysis to elucidate the differences in the expression patterns of flavonoid biosynthesis genes during seed development in yellow-and dark-seeded B. juncea. Based on the flavonoid biosynthesis gene and protein sequences of 31 A. thaliana obtained from the TAIR10 database (Supplementary Table S5), a total of 101 homologous genes were identified in this study (Supplementary Table S5, Supplementary Figure S4), and the functionally relevant amino acid residues of these identified genes had highly conserved positions (Supplementary Table S6). Subsequently, we investigated their transcript levels during seed development in yellow-and dark-seeded B. juncea using RNA-seq analysis (BioProject ID: PRJNA723131). To validate RNA-Seq results, 12 flavonoid biosynthesis genes with seed coat color formation were selected for RT-qPCR analysis, which had high expression levels with obvious differences (Figure 5b). Linear regression analysis indicated that the fold changes for gene expression investigated by RT-qPCR and RNA-Seq data were significantly positively correlated (R2 = 0.75; Figure 5a), suggesting that these results were reliable. their transcript levels during seed development in yellow-and dark-seeded B. juncea using RNA-seq analysis (BioProject ID: PRJNA723131). To validate RNA-Seq results, 12 flavonoid biosynthesis genes with seed coat color formation were selected for RT-qPCR analysis, which had high expression levels with obvious differences (Figure 5b). Linear regression analysis indicated that the fold changes for gene expression investigated by RT-qPCR and RNA-Seq data were significantly positively correlated (R2 = 0.75; Figure 5a), suggesting that these results were reliable.   Table S6), supporting that these homologous copies are likely not related to seed coat color formation in B. juncea. In addition, BjuFLS3_A showed a contrary expression pattern, which had relatively high expression levels in the yellow-seeded variety, and other FLSs were highly expressed in the dark-seeded variety ( Figure 5). Among them, structural genes (BjuCHSs, BjuCHIs, and BjuF3Hs), also named the EBGs [36], play a role in the common precursors of flavonoids [31][32][33]48], indicating that these structural genes may determine the accumulation levels of the common precursors of flavonoids. However, structural genes, BjuDFRs, BjuANSs, and BjuBANs, leading to anthocyanin and proanthocyanidin [35][36][37][38] were expressed at very low levels in yellowseeded B. juncea seeds, and they were expressed at much higher levels in dark-seeded B. juncea seeds; meanwhile, the transcription levels of BjuTT19 also displayed an obvious difference, with a continuous decline during seed development in both the yellow-and dark-seeded varieties ( Figure 5). Our findings suggest that trace PA accumulation in the yellow seeds is associated with the low expression level of flavonoid biosynthesis genes, in accordance with previous results [35,56].
Previous results showed that LBGs were also positively activated by MBW complexes [39][40][41]48]. In this study, the dynamic expression of BjuTT2, BjuTT8, and BjuTTG1 showed similar expression patterns with LBGs ( Figure 6). Among them, studies have also shown that TT8 is required to positively activate the transcription of LBGs, which is involved in the seed coat color of B. juncea [8], B. rapa [12], and B. napus [7,11,13]. However, the yellow-seeded trait had different variations in Brassica species. Therefore, we further compared the sequences of TT8 from dark-seeded B. nigra and yellow-and dark-seeded B. rapa, B. oleracea, B. napus, B. juncea, and B. carinata ( Figure 6, Supplementary File S1). The results showed that the length of the TT8 varied among Brassica species from 3548 bp in dark-seeded B. rapa (BraTT8A-d) to 4826 bp in yellow-seeded B. juncea (BjuTT8A-y). An insertion of 1276 bp was located in exon 7 of the yellow-seeded B. juncea sequence (BjuTT8A-y) (Figure 6a, Supplementary File S1). Five SNPs were also identified in the exons (exons 1, 2, 3, 6, and 7) of the TT8A gene between yellow-and dark-seeded B. rapa, but only one SNP was detected between the yellow-and dark-seeded B. napus (Figure 6a,  Supplementary Figure S5). Many other SNPs and indels were also found among the TT8A sequences (Supplementary Figure S5), but the extron sequences of TT8B and TT8C copies were highly conserved among these Brassica species. For example, one SNP was detected in the seventh extron of BjuTT8B and BcaTT8B between B. juncea and B. carinata, and one SNP in first extrons of BolTT8C and BcaTT8C was detected in yellow-and dark-seeded B. oleracea and B. carinata (Figure 6b,c, Supplementary Figures S6 and S7). Our results support the idea that variants in the TT8A gene might be a crucial factor for seed coat color in Brassica species. Additionally, the gene BjuTT19s, which encodes a protein reported to be involved in procyanidin transport [57,58], also had relatively low expression levels in yellow-B. juncea (Figure 5a), supporting the low accumulation of PAs in the yellow-seeded B. juncea (Table 1). Our results thus suggest that genes for the accumulation of PAs play an important role in seed coat color of B. juncea. To further explore the regulatory network of flavonoids in yellow-and dark-seeded B. juncea, we analyzed the identified flavonoids using UPLC-HESI-MS/MS and the expression patterns of differentially expressed genes related to flavonoid biosynthesis by means of RNA-seq analysis (BioProject ID: PRJNA723131). At the first dedicated step of the flavonoid biosynthetic pathway, the expression levels of BjuCHIs, BjuF3Hs, and BjuTT7s showed different expression patterns in yellow-and dark-seeded B. juncea (Figure 5a), and the corresponding metabolites (i.e., naringenin, eriodictyol, quercetin isorhamnetin, and its corresponding derivatives) also had relatively low accumulation levels in the yellow seeds (Supplementary Table S4), indicating that these EBGs play important roles in promoting the production of enzymes to catalyze the metabolism of the substrates. Im- To further explore the regulatory network of flavonoids in yellow-and dark-seeded B. juncea, we analyzed the identified flavonoids using UPLC-HESI-MS/MS and the expression patterns of differentially expressed genes related to flavonoid biosynthesis by means of RNA-seq analysis (BioProject ID: PRJNA723131). At the first dedicated step of the flavonoid biosynthetic pathway, the expression levels of BjuCHIs, BjuF3Hs, and BjuTT7s showed different expression patterns in yellow-and dark-seeded B. juncea (Figure 5a), and the corresponding metabolites (i.e., naringenin, eriodictyol, quercetin isorhamnetin, and its corresponding derivatives) also had relatively low accumulation levels in the yellow seeds (Supplementary Table S4), indicating that these EBGs play important roles in promoting the production of enzymes to catalyze the metabolism of the substrates. Importantly, we found that the epicatechin and procyanidin oligomers ([DP2]-1, [DP2]-2, [DP3]-2, [DP3]-1, and [DP4]) had trace content (< 0.01 µg/g FW) in yellow seeds. Transcriptome data showed that BjuDFR, BjuANS, and BjuBAN had almost no expression in the yellow seeds, indicating that these genes might be suppressed in promoting yellow seed pigmentation. We therefore speculated that the lower expression levels of the LBGs (BjuDFR, BjuANS, and BjuBAN) resulted in a deficiency of epicatechin and PAs, which predominantly determined the seed coat color formation in B. juncea. Meanwhile, BjuTT19 encoding glutathione transferase were remarkably down-regulated or not expressed in the yellow-seeded variety ( Figure 5), indicating that the accumulation of PAs is inhibited in the yellow-seeded variety. Overall, a diagram of the flavonoid metabolomics pathways in B. juncea shows that the expression patterns of most flavonoid biosynthesis genes were consistent with the metabolite accumulation patterns (Figure 7), while epicatechin and PAs contribute to dark seed coat color formation in B. juncea.  Table 1. 055). The color represents relative gene expression levels.

Discussion
Brassica juncea, a crop plant cultivated worldwide, produces black, brown, or yellow seeds; these colors are produced by flavonoids, the largest group of specialized metabolites, including flavonols, flavones, isoflavones, flavanols, and PAs [ B. rapa, B. nigra, and B. oleracea) and three amphidiploid species (B.  juncea, B. napus, and B. carinata), which are closely interrelated and well understood by  Table 1. 055). The color represents relative gene expression levels.

Discussion
Brassica juncea, a crop plant cultivated worldwide, produces black, brown, or yellow seeds; these colors are produced by flavonoids, the largest group of specialized metabolites, including flavonols, flavones, isoflavones, flavanols, and PAs [6,8,35,42]. To date, many phenolic compounds have been investigated in the Brassica species [13,14,16,[59][60][61], but they are still not well understood in B. juncea seeds. Cultivated Brassica species include three diploid species (B. rapa, B. nigra, and B. oleracea) and three amphidiploid species (B. juncea, B. napus, and B. carinata), which are closely interrelated and well understood by the theory of the 'triangle of U' [62]. Therefore, these results provide important clues to discover the biochemical mechanisms of the flavonoid biosynthesis pathway in B. juncea. In addition, most metabolites involved in the seed coat color, such as epicatechin, isorhamnetin, kaempferol, quercetin, and their derivatives, are widely detected in Brassica crops [13,14,16,27,52,[59][60][61]63], suggesting that they might share a common metabolic pathway. In this study, a total of 1440 major metabolic base peak chromatograms using UPLC-HESI-MS/MS analysis, and 236 metabolites were well-identified in the developmental seeds of B. juncea, including 31 phenolic acids, 47 flavonoids, 17 glucosinolates, 38 lipid compounds, 69 other hydroxycinnamic acid compounds, and 34 unknown compounds (Figure 1e-g, Supplementary Table S1). Previous studies highlighted the role of PAs in the seed coat of the Brassica species [13,14,16,27,52,[59][60][61]. Based on levels of the identified metabolic compounds, yellow-and dark-seeded B. juncea could be distinguished (Supplementary Figure S1). Among them, we found that flavonoids are the predominately different metabolites between the yellow-and dark-seeded varieties (Table 1, Figure 3b), suggesting that they were involved in the modulating seed coat color formation of B. juncea. Meanwhile, we found that the greatest number of differentially expressed compounds was detected at 30 DAP (Figure 2b), indicating that most different metabolites might be catalyzed in this stage [13]. Importantly, 36 differential metabolic compounds were significantly accumulated throughout seed development (Figure 2d, Table 1), suggesting that they might be closely associated with the formation of seed coat color in B. juncea. Among them, 17 common differentially abundant metabolites were grouped into flavonoids (Table 1) [64,65]; epicatechin and PAs have often been reported to be involved in the seed coat color of Brassica species [11,13,14,16]. In addition, 13 of 36 unknown compounds were detected and showed significant differences in the yellow-and dark-seeded B. juncea seeds ( Table 1, Supplementary Table S1). In summary, our findings found that epicatechin and its corresponding derivatives were important in the seed coat color variation of B. juncea seeds, but additional studies are needed to verify whether these unknown metabolites are responsible for the seed coat color of B. juncea seeds.
In plants, flavonoids are widely distributed and contribute to coloration, which is well understood in Arabidopsis and Brassica [10,30,35,37,38,43,52,57,66]. The release of the genomic data for B. juncea [1,41], B. rapa [67,68], and B. nigra [69] offered us a chance to elucidate the flavonoid biosynthetic pathway at a genome-wide level in B. juncea. In this study, 101 flavonoid biosynthesis genes were identified based on the released B. juncea genome sequence (Supplementary Table S5). In addition, the color of the flowers, fruits, and seeds are mainly determined by the flavonoids, which are the most widely used metabolic pathway in plants [10,55]. Herein, we compared the expression patterns of these flavonoid biosynthesis genes during seed development in yellow-and dark-seeded B. juncea ( Figure 5). In general, 25 genes were expressed at low levels in both yellowand dark-seeded B. juncea, while 76 genes showed higher expression in the dark-seed variety than in the yellow-seed variety (Figure 5a). Interestingly, we found that transcripts of the structural genes (BjuCHSs, BjuCHIs, BjuF3Hs, and BjuTT7s), the first dedicated step of the flavonoid biosynthetic pathway, were more abundant in dark-seeded than in yellow-seeded B. juncea, while the LBGs, such as BjuDFRs, BjuANSs, and BjuBANs, had hardly or almost no expression in yellow-seeded B. juncea (Figure 5a), indicating that these were important factors affecting seed coat formation in B. juncea [35]. Similarly, numerous studies on the mechanism of seed coat color in Brassica species have shown that seed coat color is mainly influenced by the lower expression of the genes that regulate the flavonoid biosynthesis pathway [7,11,13,14,70,71]. Therefore, we speculate that these genes used for PA biosynthesis lead to the different seed coat color variation in B. juncea.
To examine flavonoid accumulation during seed development, we performed further correlation analysis between the transcriptome and metabolite profiling, and the expression patterns of some flavonoid biosynthetic genes showed a high association with the accumulation of some flavonoids and their derivatives. Previous studies showed that TT8 is a central component of MYB-bHLH-WD repeat complexes and is essential in the flavonoid biosynthetic pathway [8,12,43]. As expected, BjuTT8 also had significantly higher expression levels in the dark-seeded variety than in yellow-seeded variety of B. juncea (Figure 7), consistent with previous findings that TT8 accounted for increasing levels of anthocyanin accumulation in plants [12,35,38,43]. In addition, recent studies suggest that BjuTT8 could promote the expression of BjuDFR and BjuTT19 thus increasing anthocyanin accumulation and conferring the purple color of B. juncea leaves [38], which also acts as a major regulator of flavonoid biosynthesis pathways in Brassica plants [42,[72][73][74]. Herein, BjuTT8 and LBGs (BjuDFRs, BjuANSs, and BjuBANs) are also closely related to epicatechin and its derivatives in this study (Figure 7). In Chinese cabbage, the yellow-seeded trait is due to a deletion of TTG1 [75] and in yellow sarson, a trait is due to the insertion of a transposable element in TT8 [12]. Therefore, we further investigated the variation of TT8 structure and identified substantial variation among the sequences in B. rapa, B. napus, and B. juncea (Figure 6a, Supplementary Figures S5-S7). In fact, the alignment of the BjuTT8A gene revealed an insertion of 1276 bp in exon 7, similar to published results (Supplementary Figure S5) [8], but there is no mutation in BjuTT8B gene members, indicating that TT8 homologous genes may have different functions in different B. juncea varieties. In addition, a large insertion was detected in the yellow-seeded B. rapa [12], while we found the obviously SNP variation between the yellow-and dark-seeded plants by alignment of BraTT8A (Figure 6a, Supplementary Figure S5) but no presence of BnaTT8A, suggesting that TT8A showed different functional variations in B. rapa, B. juncea, and B. napus. However, the sequences of the TT8 genes were highly conserved in B. nigra, B. oleracea, B. napus, and B. carinata (Figure 6b,c, Supplementary Figures S6 and S7). Therefore, our findings propose that TT8A genes play important roles in the stability of the seed coat color among the Brassica species, which has undergone functional differentiation during the evolution of the Brassica genome.
In this study, we also identified other differences that shed light on seed color in B. juncea. For example, four copies of the transport gene BjuTT19 were also significantly up-regulated in dark-seeded B. juncea relative to yellow-seeded B. juncea ( Figure 5), and the low levels of metabolites were detected in yellow-seeded B. juncea (Supplementary  Table S2). Hence, we hypothesized that low expression levels of BjuTT19 may be involved in the seed coat color through the inhibition of anthocyanin/PAs transport in the flavonoid biosynthesis pathway of B. juncea. In addition, the metabolites oenin and tulipanin were preliminarily identified in the developing seeds using accurate MS and MS/MS spectral data, which was catalyzed by the flavanone 3',5'-hydroxylase (F3 5 H) [76]. Previous results have shown that multiple F3 5 H evolutions from F3 H have occurred in dicotyledonous plants [76] and have shown that F3 H and F3 5 H belong to cytochrome P450-dependent enzymes (CYP), a diverse class of heme-containing oxidases that are present in all types of organisms [77], indicating that the F3 H may have a redundant role in the control of the 3'5'-hydroxylated flavonoids of Arabidopsis and Brassica plants. Therefore, abundant metabolites were detected (Supplementary Table S1) and need to be further identified. In summary, our results improve our understanding of the molecular mechanism of seed coat color formation in Brassica species and demonstrate that combining metabolome and transcriptome analysis is an effective method for identifying key genes involved in flavonoid biosynthesis.

Plant Materials and Culture Conditions
Eight yellow-and dark-seeded B. juncea accessions (Figure 1a, Supplementary Table S8) were used in this study and were obtained from The Research Institute of Oil Crops in Guizhou. All eight accessions were sown in late September 2019 and grown under normal environmental conditions in the experimental field in Guiyang, Guizhou, China, located at the coordinates of 26 • 44 N, 106 • 43 W and at an altitude of 1250 m. Each accession was grown in randomized complete blocks with three rows at each site (0.4 m between rows and 0.2 m between plants) under normal field conditions. To investigate the dynamic metabolites of the seeds during maturation, the flowers were marked with different colored wool to keep the seeds at the same development stages together. At 20, 30, and 40 days after pollination (DAP), the immature seeds were gently harvested from five individual plants and pooled into a 5-mL centrifuge tube, immediately frozen in liquid nitrogen, and then stored at −80 • C until total RNA and crude metabolite extraction.

Metabolite Extraction and Analysis
Metabolites were extracted using the previously described methods [14] with slight modifications. Fresh seeds (100 mg) were weighed in pre-cooled microfuge tubes with liquid nitrogen (2-mL, Eppendorf, Germany), quickly crushed into powder, and homogenized in an aqueous solution of formic acid (0.1% (v/v)) in aqueous methanol (1 mL; 80% (v/v)). Samples were then sonicated (KQ-100E, Kunshan, China) for 1 h, and the crude extracts were centrifuged for 15 min at 10,000× g. Subsequently, the precipitate was extracted again using the same method. Finally, the pooled supernatants were filtered via a 0.22-µm nylon filter and analyzed using ultra-high-performance liquid chromatography-heated electrospray ionization-tandem mass spectrometry (UPLC-HESI-MS/MS). All samples were analyzed in four biological duplicates.
After filtering, 10 µL samples were analyzed using a Dionex UltiMateTM 3000 UHPLC system (Thermo Fisher Scientific, Waltham, MA, USA) connected to a Thermo Scientific Q-Exactive System equipped with an S-Lens ionizer source (Thermo Scientific, USA) in negative mode. An Acquity UPLC BEH C18 chromatography column (2.1 i.d. × 150 mm, 1.7-µm particle size) (Waters, Ireland) was used with a guard column (Acquity UPLC BEH The spectra were recorded using full scan mode, covering a mass range from m/z 100 to 1500. The operation parameters were as follows: source voltage, 3.5 kV; sheath gas, 35 (arbitrary units); auxiliary gas, 10 (arbitrary units); sweep gas, 0 (arbitrary units); and capillary temperature, 350 • C.
All raw data were converted for free for use in the MS data analysis tool using the ABF (Analysis Base File) converter (http://www.reifycs.com/AbfConverter/index.html, accessed on 22 March 2020). The datasets were then analyzed using MS-DIAL version 4.18 software with a mass bank negative database (http://prime.psc.riken.jp/compms/ msdial/main.html#MSP, accessed on 22 March 2020) [78]. Meanwhile, the raw UPLC-HESI-MS/MS data were further analyzed using Xcalibur 3.1 software. Discernible base peak areas from the built-in analyst quantitation wizard were manually corrected. Finally, the compounds were confirmed by comparing their retention times, accurate MS, and MS/MS spectral data together with the commercial standards and previously reported information [14,25]. Based on their mass spectrum, the concentrations of the identified compounds were quantified using calibration curves obtained with the corresponding or similar standards in this study.

Statistical Analysis
Principal component analysis (PCA) and t-tests were performed on the web-based sever Metabolite Sets Enrichment Analysis 4.0 (MSEA 4.0 or MetaboAnalyst 4.0; http: //www.metaboanalyst.ca, accessed on 22 March 2020) [79]. Multiple regression orthogonal partial least-squares discriminant analysis (OPLS-DA) was performed using SIMCA V14.1 (https://umetrics.com/, accessed on 22 March 2020) with the default parameters. The significantly differential metabolites were screened using the variable importance in the projection (VIP) value of the first principal component in the OPLS-DA model combined with fold change (FC). The metabolites with variable importance in projection value (VIP) ≥ 1, p value < 0.05, and fold change (FC) of ≥1.6 (up-regulated) or ≤0.625 (downregulated) were considered to be differential metabolites. Volcano plots were used to filter metabolites of interest based on the log2(FC) of the metabolites [80,81]. Data were expressed as the mean ± standard deviation (SD) of four biological replicates.

Identification of Flavonoid Biosynthesis Genes in B. juncea
A total of 31 gene sequences in A. thaliana related to flavonoid biosynthesis were downloaded from The Arabidopsis Information Resource (TAIR) database (https://www. arabidopsis.org/, accessed on 22 July 2020). The B. juncea reference genome sequence (Bju15: Brassica juncea V1.5) and gene sequences from BRAD (http://brassicadb.cn, accessed on 22 July 2020) were used to identify the flavonoid biosynthesis genes in B. juncea. We used the flavonoid biosynthesis gene and protein sequences of 31 A. thaliana to align with the B. juncea genome and protein sequences using BLASTn and BLASTp with a cutoff E-value of ≤1 × 10 −20 , respectively. The homologous gene pairs were confirmed through multiple alignments performed using Geneious 4.8.5 software (http://www.geneious.com/, accessed on 22 July 2020; Biomatters, Auckland, New Zealand), which showed the syntenic analysis results across B. juncea and each A. thaliana. The conserved motifs of flaovonoid biosynthesis genes (Supplementary Figure S4) were characterized using Multiple Em for Motif Elicitation (MEME) 4.11.4 (http://meme-suite.org/tools/meme, accessed on 22 July 2020) with the following parameters: any repetitions; maximum 5 motifs; and 6 and 300 residues width of each motif [82]. Genes with an E-value of <1 × 10 −20 were used for further analysis. Eventually, these identified flavonoid biosynthesis genes were confirmed through the knowledge-based identification of pathway enzymes (KIPEs; https://github.com/bpucker/KIPEs, accessed on 22 July 2020) [83] 4.6. Transcriptome Sequencing Analysis of Differentially Expressed Genes in the Flavonoid Pathway To better understand the pathways of flavonoid biosynthesis in B. juncea, we performed BLASTP searches in the Brassica database (B. juncea genome V1.5; http://brassicadb. cn, accessed on 22 July 2020) [1] using the protein sequences related to flavonoid synthesis genes in Arabidopsis [28,30,84] as queries. Subsequently, the expression profiles of flavonoid biosynthesis genes were further revealed using transcriptome analysis based on RNA-seq data. For each developmental stage of each B. juncea line (20, 30 and 40 DAP), total RNA was extracted from the seed samples (approximately 100 mg) stored at −80 • C using an EZ-10 DNAaway RNA Mini-Preps Kit (Sangon Biotech Co., Ltd., Shanghai, China). The RNA quality was validated using 1% agarose gel, and the purity, concentration, and integrity were confirmed using a Nanodrop spectrophotometer (Thermo Fisher Scientific, Inc., Worcester, MA, USA) and an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., Santa Clara, CA, USA), respectively. The mRNA was enriched using the NEBNext ® Poly(A) mRNA Magnetic Isolation Module, and a cDNA library was constructed using the recommended program, NEBNext®mRNA Library Prep Master Mix Set, following manufacturer's recommendations. The libraries were sequenced on an Illumina HiSeq2000 sequencing platform by Novogene Bioinformatic Technology Co. Ltd. (Tianjin, China). Two biological replicates were used for the transcriptome analysis.
The adapter sequences and unknown or low-quality reads were filtered using Trimmomatic version 0.32 [85], and clean reads were mapped to the B. juncea reference genome sequence [1] (Bju15: Brassica juncea V1.5; http://brassicadb.cn, accessed on 22 July 2020) using HISAT (hierarchical indexing for spliced alignment of transcripts; version 2.1.0) with the default parameters (phred33-p5-sensitive-no-discordant-no-mixed-I 1-X 1000) [86], and the number of mapped reads was quantified using HTseq [87]. Analysis of differential gene expression between the yellow-and dark-seeded varieties were performed with the DESeq2 (version 1. 16 Table S9). Genes with an adjusted p-value that was ≤ 0.05 and fold changes ≥ 2 were considered as differentially expressed. The gene expression profiles were evaluated using FPKM (fragments per kilo base of exon model per million) values. The heatmap was generated using TBtools (version 1.055) [89].

Reverse-Transcription Quantitative PCR (RT-qPCR) Analysis
To validate the transcriptome data and to characterize the flavonoid genes that were differentially expressed in the developing seeds of yellow-and dark-seeded B. juncea, total RNA was isolated from the samples using a DNA away RNA Mini-Prep Kit (Sangon Biotech, Shanghai, China). Subsequently, the cDNAs were synthesized using an RNA PCR Kit (AMV, v3.0) based on the manufacturer's protocols (Takara, Dalian, China). The cDNA was subjected to RT-qPCR analysis using SYBR qPCR SuperMix Plus (NovoStart) on a Bio-Rad CFX96 Real-Time System (Bio-Rad Laboratories, Hercules, CA, USA), as previously described [13]. Tonoplastic intrinsic proteins-41 (TIPS-41) was used as a reference gene to normalize the gene expression levels via the 2 −∆∆Ct method [90]. Three biological replicates were performed for all experiments. The specific primer sequences used in this study were obtained from the RT-qPCR Primer Database [91] and are listed in Supplementary Table S10.

Gene Cloning and Multi-Sequence Analysis
The genomic DNA and open reading frame of TT8 were amplified using the genomic DNA and cDNA of dark-seeded B. nigra and yellow-and dark-seeded B. rapa, B. oleracea, B. napus, B. juncea, and B. carinata as templates, respectively. Subsequently, the sequences of TT8 were subjected to multiple sequence alignments using ClustalW software (version 2.0) with default settings [92]. PCR, cloning of PCR production and the selection of positive clones were performed in three biological repetitions. Positive clones were sequenced using M13F/M13R primer (Tsingke Biological Technology Company, Beijing, China). The specific primer sequences used in this study are listed in Supplementary Table S10.

Conclusions
In this study, 236 metabolite compounds were identified from B. juncea seeds, including 31 phenolic acids, 47 flavonoids, 17 glucosinolates, 38 lipids, 69 other hydroxycinnamic acid compounds, and 34 novel unknown compounds. 35 of them showed significant differences in yellow and dark B. juncea seeds, especially with regard to flavonoids. In addition, 101 homologous flavonoid genes were identified, and their expression patterns were investigated using RNA-Seq and RT-qPCR analysis, most of them showed significant expression levels in yellow and dark B. juncea seeds. Importantly, comparative transcriptome and metabolome analyses of yellow and dark B. juncea revealed high consistent changes in flavonoid biosynthesis genes (BjuDFR, BjuANS and BjuBAN, BjuTT8 and BjuTT19) and the levels of flavonoids (epicatechin and its derivatives). Further, TT8 plays a crucial role in the seed coat color of Brassica species. Our results provide new insights into the understanding of the synthesis and accumulation of flavonoids in B. juncea seeds as well as lay a solid biological foundation for breeding improvements for the Brassica species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/ijms22137215/s1. Author Contributions: J.L. and C.Q. conceptualized the project; L.L. and K.L. helped to review and edit the manuscript; S.S. and Y.T. carried out the laboratory work and data analysis and wrote the first draft of the manuscript; C.Z. contributed to sample preparation; N.Y. and G.S. verified the analytical methods; C.Z., Y.M., and F.S. carried out the experiments; S.C., R.H. and X.L. cloned genes and analyzed the data. J.L. and C.Q. had overall responsibility of this project and funding acquisition. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The data presented in this study are available upon request from the corresponding author. The RNA seq Illumina paired-end reads of the transcriptome for this study have been submitted to NCBI BioProject PRJNA723131.