Genome-Wide Association Study of Phenylalanine Derived Glucosinolates in Brassica rapa

Glucosinolates (GSLs) are sulfur-containing bioactive compounds usually present in Brassicaceae plants and are usually responsible for a pungent flavor and reduction of the nutritional values of seeds. Therefore, breeding rapeseed varieties with low GSL levels is an important breeding objective. Most GSLs in Brassica rapa are derived from methionine or tryptophan, but two are derived from phenylalanine, one directly (benzylGSL) and one after a round of chain elongation (phenethylGSL). In the present study, two phenylalanine (Phe)-derived GSLs (benzylGSL and phenethylGSL) were identified and quantified in seeds by liquid chromatography and mass spectrometry (LC-MS) analysis. Levels of benzylGSL were low but differed among investigated low and high GSL genotypes. Levels of phenethylGSL (also known as 2-phenylethylGSL) were high but did not differ among GSL genotypes. Subsequently, a genome-wide association study (GWAS) was conducted using 159 B. rapa accessions to demarcate candidate regions underlying 43 and 59 QTNs associated with benzylGSL and phenethylGSL that were distributed on 10 chromosomes and 9 scaffolds, explaining 0.56% to 70.86% of phenotypic variations, respectively. Furthermore, we find that 15 and 18 known or novel candidate genes were identified for the biosynthesis of benzylGSL and phenethylGSL, including known regulators of GSL biosynthesis, such as BrMYB34, BrMYB51, BrMYB28, BrMYB29 and BrMYB122, and novel regulators or structural genes, such as BrMYB44/BrMYB77 and BrMYB60 for benzylGSL and BrCYP79B2 for phenethylGSL. Finally, we investigate the expression profiles of the biosynthetic genes for two Phe-derived GSLs by transcriptomic analysis. Our findings provide new insight into the complex machinery of Phe-derived GSLs in seeds of B. rapa and help to improve the quality of Brassicaceae plant breeding.


Introduction
The glucosinolates (GSLs) constitute a group of sulfur-containing and amino acidderived secondary metabolites, which play important roles in plant defense and human health and are present in all plants of the Brassicaceae family [1,2]. Rapeseed is one of the major oil crops grown worldwide, including Brassica rapa (AA, 2n = 20), Brassica napus (AACC, 2n = 38) and Brassica juncea (AABB, 2n = 36). After oil extraction of the seeds, the GSLs remaining in pressed rapeseed meal will significantly reduce its nutritional value [3,4]. Therefore, breeding rapeseed with low content of GSLs in seeds is an indispensable objective.
The GSLs are derived from amino acid-derived oximes that are subsequently decorated with a thioglucose moiety and a sulfate group. Based on the precursor amino acid and the types of elongation and modification to the side chain, GSLs can be classified into three classes, including aliphatic GSLs derived from methionine (Met), valine (Val), leucine (Leu) and isoleucine (Ile), indolic GSLs derived from tryptophane (Trp) and benzenic GSLs derived from phenylalanine (Phe) and tyrosine (Tyr) [5,6]. In B. rapa, the main GSLs are derived from Met, Trp and Phe [7]. Another classification is based on chain elongation or not. Some GSLs are biosynthesized directly from usual protein amino acids, this group includes the Trp-derived indolylmethylGSL and benzylGSL derived from Phe. The steps from amino acid to GSL are known as the core structure biosynthesis. However, the biosynthesis of most GSLs involves an initial chain elongation of the precursor amino acid. This group of GSLs include Phe-derived phenethylGSL (2-Phenylethyl GSL) and the Met-derived GSLs. The biosynthesis of phenethylGSL involves a single round of chain elongation, resulting in one extra CH 2 group in the chemical structure, while the biosynthesis of Met-derived GSLs can involve several rounds of chain elongation. The group of chain-elongated GSLs begin their biosynthesis with removal of the amino group by a branched-chain amino acid aminotransferase (BCAT4), followed by import to the chloroplast by bile acid transporter (BAT5), where the chain elongation takes place. As we know, the chain elongation includes a threestep transformation: condensation with acetyl-CoA by methylthioalkylmalate synthase enzymes (MAM), isomerization by an isopropylmalate isomerase (IPMI, IIL) and oxidative decarboxylation by an isopropylmalate dehydrogenase (IMDH), yield chain-elongated 2-oxo acid. More cycles of elongation processes would happen for Met derivates, as for phenethylGSL, the 2-oxo acid with one more methylene group is exported to the cytosol and finally transamination by BCAT3 to enter the core structure biosynthesis mentioned above [5,6,8,9]. Overall, the difference between biosynthesis of phenethylGSL and ben-zylGSL includes both presence or absence of chain elongation as well as different CYP79 enzymes in the first step of the core structure biosynthesis [6,[10][11][12][13]. Many structural genes are shared among the biosynthesis of phenethylGSL and the Met-derived GSLs [1,10], and it is uncertain what factors control the relative flux to Met-derived GSLs and phenethylGSL.
In plants, the GSL pathway has also been well known as a 'model' for secondary metabolites [10]. To date, several MYB transcription factors have been verified to regulate GSL biosynthesis in different plants. For example, higher expression of BoMYB28 increased the content of Met-derived 4-(methylsulfinyl) butylGSL in broccoli (Brassica oleracea var italica), and targeted silencing of BjMYB28 homologs in Brassica juncea significantly reduced GSLs derived from Met in seeds [14,15]; overexpressing BoMYB29 homologs could upregulate GSL biosynthesis from Met in B. oleracea [16], and BrMYB34, BrMYB51 and BrMYB122 act together to control biosynthesis of Trp-derived GSLs in A. thaliana [17]. Taken together, there is consensus that MYB28, MYB29 and MYB76 regulate the biosynthesis of Met-derived GSLs, while the biosynthesis of Trp-derived GSLs is regulated by MYB34, MYB51 and MYB122. In contrast to the rather well understood regulation of GSL biosynthesis from Met and Trp, the regulation of the biosynthesis of both benzylGSL (without chain elongation) and phenethylGSL (with chain elongation) is poorly understood.
Genome-wide association studies (GWAS) associate phenotypes with genes on a genomic scale. It is verified that metabolite-based GWAS (mGWAS) is a powerful method for the screening of candidate genes for different metabolites and dissect complex traits in plants [18,19]. Moreover, liquid chromatography and mass spectrometry (LC-MS) analysis has been widely used for qualitative and quantitative analysis on GSLs in different plant tissues [20][21][22]. To dissect the genetic basis of two Phe-derived GSLs in B. rapa, we intend to detect QTNs (quantitative trait nucleotides, which link with the tested quantitative trait and are detected by multi-locus random-SNP-effect mixed linear model tools (mrMLM)), candidate genes and regulatory networks associated with benzylGSL and phenethylGSL of B. rapa by LC-MS, GWAS and transcriptome analysis. In this study, numerous QTNs, significant genes and characterized biosynthesis pathways were potentially involved in the regulation of benzylGSL and phenethylGSL in B. rapa. The results will help us to better understand genetic basis of two Phe-derived GSLs in B. rapa and lay a foundation for improving the quality of rapeseed.

The Dynamic Accumulation of Phe-Derived Glucosinolates in B. rapa Seeds
We studied seed development in GSL levels as a function of days after pollination (DAP). The qualitative and quantitative analysis of benzylGSL (C91) and phenethylGSL (C121) were performed in 15 DAP, 25 DAP, 35 DAP, 45 DAP and 50 DAP seeds of BrHG (B. rapa line with high content of total GSLs in seeds) and BrLG (B. rapa line with low content of total GSLs in seeds). BenzylGSL and phenethylGSL were identified in LC-MS data from retention time, m/z of the molecular ion [M-H](-) and the occurrence of the common GSL fragment HSO4(-) at m/z 97 ( Figure 1). The retention time of phenethylGSL (C121 at 6.16 min) was higher than for benzylGSL (C91 at 4.31 min) in agreement with the chemical structures. The content of phenethylGSL in seeds was much higher than benzylGSL. The highest level of benzylGSL existed in 20 DAP seeds of BrLG and declined in pace with seed maturation, on the contrary, the content of benzylGSL increased with seed maturity in BrHG and was with highest level in 50 DAP seeds. No accumulation of phenethylGSL was detected in young seeds (15 DAP and 25 DAP seeds) of B. rapa (Figure 1c,d) The accumulation trends of phenethylGSL content were similar between BrHG and BrLG, which both increased with seed maturity. GWAS and transcriptome analysis. In this study, numerous QTNs, significant genes and characterized biosynthesis pathways were potentially involved in the regulation of benzylGSL and phenethylGSL in B. rapa. The results will help us to better understand genetic basis of two Phederived GSLs in B. rapa and lay a foundation for improving the quality of rapeseed.

The Dynamic Accumulation of Phe-Derived Glucosinolates in B. rapa Seeds
We studied seed development in GSL levels as a function of days after pollination (DAP). The qualitative and quantitative analysis of benzylGSL (C91) and phenethylGSL (C121) were performed in 15 DAP, 25 DAP, 35 DAP, 45 DAP and 50 DAP seeds of BrHG (B. rapa line with high content of total GSLs in seeds) and BrLG (B. rapa line with low content of total GSLs in seeds). BenzylGSL and phenethylGSL were identified in LC-MS data from retention time, m/z of the molecular ion [M-H](-) and the occurrence of the common GSL fragment HSO4(-) at m/z 97 ( Figure 1). The retention time of phenethylGSL (C121 at 6.16 min) was higher than for benzylGSL (C91 at 4.31 min) in agreement with the chemical structures. The content of phenethylGSL in seeds was much higher than benzylGSL. The highest level of benzylGSL existed in 20 DAP seeds of BrLG and declined in pace with seed maturation, on the contrary, the content of benzylGSL increased with seed maturity in BrHG and was with highest level in 50 DAP seeds. No accumulation of phenethylGSL was detected in young seeds (15 DAP and 25 DAP seeds) of B. rapa (Figure 1c,d) The accumulation trends of phenethylGSL content were similar between BrHG and BrLG, which both increased with seed maturity.

QTNs (Quantitative Trait Nucleotides) Detected for Two Phe-Derived Glucosinolates
In this study, 43 and 59 QTNs were detected for benzylGSL (C91) and phenethylGSL (C121) respectively, and distributed on 10 chromosomes and 9 scaffolds (Table S1). Eleven and 10 QTNs were repeatedly detected by two or three different GWAS algorithms for benzylGSL and phenethylGSL on different chromosomes, but the QTNs detected for the two Phe-derived GSLs were all different from each other ( Figure 2, Table S1). Importantly, the contribution rate of QTNs ranged from 0.56% to 70.86%, and 21 QTNs of them were over 10% (9 QTNs for benzylGSL and 12 QTNs for phenethylGSL). It was noticed that one QTN with a high contribution rate for benzylGSL was strikingly detected on A2 chromosome (27.03 Mb), and two other QTNs with high contribution rate for phenethylGSL were detected on A9 (21.77 Mb) and A3 (32.77 Mb) chromosomes ( Figure 2). The distribution of detected QTNs on chromosomes was uneven, and it was noticed that QTNs were gathered on different chromosomes. The QTN number in 1Mb of chromosome were calculated, and 5 hot regions with higher density of QTNs were obtained, which were distributed on A01 of B. rapa. BrLG, B. rapa with low content of GSLs; BrHG, B. rapa with high content of GSLs. The content referred to the micrograms of GSL anion in per gram of fresh seed weight.

QTNs (Quantitative Trait Nucleotides) Detected for Two Phe-Derived Glucosinolates
In this study, 43 and 59 QTNs were detected for benzylGSL (C91) and phenethylGSL (C121) respectively, and distributed on 10 chromosomes and 9 scaffolds (Table S1). Eleven and 10 QTNs were repeatedly detected by two or three different GWAS algorithms for benzylGSL and phenethylGSL on different chromosomes, but the QTNs detected for the two Phe-derived GSLs were all different from each other ( Figure 2, Table S1). Importantly, the contribution rate of QTNs ranged from 0.56% to 70.86%, and 21 QTNs of them were over 10% (9 QTNs for benzylGSL and 12 QTNs for phenethylGSL). It was noticed that one QTN with a high contribution rate for benzylGSL was strikingly detected on A2 chromosome (27.03 Mb), and two other QTNs with high contribution rate for phenethylGSL were detected on A9 (21.77 Mb) and A3 (32.77 Mb) chromosomes ( Figure 2). The distribution of detected QTNs on chromosomes was uneven, and it was noticed that QTNs were gathered on different chromosomes. The QTN number in 1Mb of chromosome were calculated, and 5 hot regions with higher density of QTNs were obtained, which were distributed on A01 (8.

Screening of Candidate Genes for Two Phe-Derived Glucosinolates
According to the B. rapa reference genome (http://39.100.233.196/#/Download/, accessed on 1 June 2021), a total of 3218 genes were obtained near the detected QTNs for the two Phe-derived GSLs. Combined with the expression value of collected genes by RNAseq, 748 DEGs were screened out from 3218 genes and were used as search terms for protein-protein interaction prediction with known GSL biosynthesis genes (Table S2). Six genes interacting with known GSL biosynthesis genes were obtained, including

Discussion
As one of the most important secondary metabolites in plants, people attach importance to GSLs for the benefits for human health and plant defences or drawbacks for animal consumption. As one of the three kinds of GSLs in plants, benzenic GSLs have not received enough attention due to less species and content. In this study, we provide genomic and biochemical data for an increased understanding of two Phe-derived GSLs, including accumulation pattern and screening of previously known as well as novel and unique regulatory genes.
The dynamic accumulation pattern of the two Phe-derived GSLs, benzylGSL and phenethylGSL, was different in B. rapa seeds. According to the content of benzylGSL and phenethylGSL, phenethylGSL was the main Phe-derived GSL in B. rapa seeds. Although the content of benzylGSL in young seeds was higher in BrLG than in BrHG, which was lower in high mature (50 DAP) seeds of BrLG and might be part of the reason for the glucosinolate content difference between BrHG and BrLG. Although the content of two Phe-derived GSLs was low during 15 DAP, the structural genes catalyzing the synthesis of intermediate metabolites and benzenic GSLs were extreme highly expressed in this stage, which might be because the accumulation of metabolites lagged behind the expression of genes. However, another reason for the high expression of the structural genes could be involvement in biosynthesis of the remaining Met and Trp-derived GSLs in B. rapa.
Two different stereoisomers of hydroxylated phenethylGSL were dominant in Barbarea vulgaris, which were induced by Plutella xylostella, and the genes for chain elongation were all activated in different degrees [39]. In this study, BCAT4 (BraA05g027600.3C), which catalyzes the first step of chain elongation, was with higher expression level in BrHG during 35 DAP (Figure 3), and the content of phenethylGSL also started to rise in BrHG in this stage ( Figure 1). Except for BCAT4, other chain elongation genes were not significantly correlated with the content change of phenethylGSL (Figure 3), which might due to the existence of other long-chain GSLs in B. rapa (Table S5). Same to the core structure biosynthesis genes.
The core structure biosynthesis genes for Phe-derived GSLs, Trp-derived indolic GSLs, Met-derived aliphatic GSLs, and the side-chain elongation genes for phenethylGSL and Met-derived GSLs are shared among different GSLs, the transcription factors regulating different GSLs biosynthesis were also shared [6]. It was verified that MYB34, MYB51 and MYB122 regulate indolic GSLs biosynthesis in A. thaliana and B. oleracea [17,40,41]. MYB28 was verified controlling aliphatic GSLs biosynthesis in B. juncea and B. napus [14,23], which also potentially regulated benzylGSL biosynthesis in leaf and stem of Sinapis alba [25]. MYB29 was responsible for aliphatic GSLs biosynthesis in B. oleracea [16,42], and also controlled root phenethylGSL variation in B. napus [28]. MYB34 and MYB29 were induced by P. xylostella and responsible for the biosynthesis of phenethylGSL in B. vulgaris [39]. In this study, these known genes were also obtained around linked QTNs on chromosome A02, A03 and A08 chromosomes (Table S3), indicating that these genes' members might also regulate the synthesis of Phe-derived GSLs in seeds of B. rapa. In addition, the contents of nine other GSLs (two indolic GSLs and seven aliphatic GSLs) were also calculated in BrLG and BrHG of B. rapa, the contents and accumulation dynamics were diverse from each other (Table S5). How these regulators coordinate the metabolism of various GSLs needs further study.
Quantitative trait locus (QTL) mapping has been done on benzenic GSL content in B. rapa leaves, and three QTLs for phenethylGSL were located on chromosome A04 and A07 [7]. In this study, more loci were detected in other chromosomes except for A04 and A07, which can be attributed to the different organs used for measurement of GSL content and different methods used for mapping. Loci for GSLs were also identified in B. napus seeds and leaves by GWAS, and a serial of candidate genes were screened out for GSLs metabolism [43], 9 homologous genes of which were also obtained in our study, including (BraA04g022880.3C). As the candidate genes for GSL biosynthesis in B. napus were the result of mapping on total GSLs' content [43], these genes might be also involved in regulating the Phe-derived GSLs biosynthesis. Furthermore, new candidate genes for Phe-derived GSLs biosynthesis were found in this study based on the distinct QTNs for Phe-derived GSLs, transcriptome analysis and protein-protein interaction prediction, which needed further functional verification. Our research provided novel candidate genes for Phederived GSL accumulation in B. rapa seeds and broadened the understanding on different GSL metabolism.

Plant Materials and Sampling
One hundred fifty-nine B. rapa lines were collected from China and other countries (Table S6), which were cultivated in Xining (Qinghai, China, N36 • 43 , E101 • 45 ) in 2018 and 2020 from March to August and in Xiema (Chongqing, China, N29 • 76 , E106 • 38 ) in 2019 from October to April. The leaf was sampled from one-month-old seedling of each line. The developing seeds of 35 days after pollination (DAP) were collected and mixed from 5 or more individuals for each line, furthermore, the developing seeds of 15, 25, 35, 45 and 50 DAP were also collected with two replicates in two B. rapa lines with high (BrHG) and low (BrLG) content of GSLs (the total content of GSLs), which were all quickly frozen in liquid nitrogen and were stored in a −80 • C ultra-low temperature refrigerator for long-term storage.

Glucosinolates Extraction
The raw metabolites were extracted from seed as previously described [44,45] with minor modifications. In brief, 0.1 g of fresh seeds stored at −80 • C were quickly weighed and crushed into powder using high-throughput tissue grinder (Tissuelyser-192, Shanghai, China). Then 1 mL of extracting solution (80% aqueous methanol with 0.1% formic acid) was added and homogenized by vortex for 10 s. Then, the homogenized extraction buffer were treated with sonication (KQ-100E, Kunshan, China) at 4 • C for 1 h, followed by centrifugation at 8000 g at 4 • C for 30 min. In addition, the above process was repeated again on the precipitate. Finally, the two liquid supernatants were pooled and filtered with a 0.22-µm nylon filter for UPLC-HESI-MS/MS analysis. All experiments were performed at least three replicates for per accession.

Data Processing and Glucosinolate Identification
The UPLC−HESI−MS/MS data were analyzed using MS-DIAL 4.18 software with mass bank negative database (http://prime.psc.riken.jp/compms/msdial/main.html# MSP, accessed on 2 April 2022) [46], which were automatically converted by ABF (Analysis Base File) converter (http://www.reifycs.com/AbfConverter/index.html, accessed on 12 March 2020). In addition, the UPLC−HESI−MS/MS data were collected by Xcalibur 3.1 software and used to identify the compounds based on their characterized permanents, including retention times (RTs), accurate MS and MS/MS spectral data, together with the commercial standards and previously reported information [20,[47][48][49]. Meanwhile, the unequivocally identified GSLs were quantified as our previously described, and sinigrin was used as a reference standard for drawing standard curve [44,45].

Transcriptome Analysis
The developing seeds of BrHG and BrLG during 15 [50] using Hisat2 [51]. Novel transcripts were predicted by StringTie [52], and transcript levels were calculated as FPKM (Fragments Per Kilobase of transcript sequence per Millions base pairs) with the featureCounts tool in Subread [53].

mGWAS Analysis of Phe-Derived Glucosinolates
Two Phe-derived GSLs, benzylGSL (C91) and phenethylGSL (C121), were identified and quantified in 35 DAP seeds of 159 B. rapa lines by UPLC−HESI−MS/MS. Meanwhile, DNA of 159 B. rapa local strains or inbred lines were extracted by Plant Genomic DNA Extraction kit (Tiangen, Beijing, China) and used for RAD-seq, and the reference genome was same as RNA-seq mentioned above. Fifty seven thousand five hundred fifty nine SNP markers evenly distributed in all chromosomes were extracted. STRUCTURE v2.3.4 was used for population structure analysis [54]. mrMLM v4.0 (https://cran.r-project. org/web/packages/mrMLM/index.html, accessed on 1 July 2021) was employed for GWAS using four algorithms, including mrMLM, FASTmrMLM, FASTmrEMMA and ISIS EM-BLASSO [55]. Genes located in 100 Kb downstream and upstream regions of linked quantitative trait nucleotides (QTNs) were obtained. Protein-protein interaction analysis was conducted using B. rapa as a reference in the STRING database (https:// cn.string-db.org, accessed on 12 August 2021) [56]. The known genes participating in Phe-derived glucosinolate biosynthesis were collected in KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway database (map00966: Glucosinolate biosynthesis, https: //www.genome.jp/kegg/pathway.html, accessed on 17 August 2021). Combining mGWAS with the result of transcriptome analysis, candidate genes for Phe-derived glucosinolate metabolic were screened out.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants11091274/s1, Table S1: The detected QTNs for benzylGSL and phenethylGSL in B. rapa seeds; Table S2: Protein-protein interaction prediction of novel candidate genes and known GSL biosynthesis genes in B. rapa; Table S3: FPKM value of the biosynthesis genes and novel candidate genes associated with Phe-derived GSLs in BrHG and BrLG of B. rapa; Table S4: Potential structural genes involved in Phe-derived GSL biosynthesis in B. rapa; Table S5: The content of GSL derivatives in developmental seeds of BrLG and BrHG in B. rapa (µg/g FW); Table S6: The information of 159 B. rapa accessions, and benzylGSL and phenethylGSL contents in different environments.