Molecular Sciences Genetic Analysis of Health-related Secondary Metabolites in a Brassica Rapa Recombinant Inbred Line Population

The genetic basis of the wide variation for nutritional traits in Brassica rapa is largely unknown. A new Recombinant Inbred Line (RIL) population was profiled using High Performance Liquid Chromatography (HPLC) and Nuclear Magnetic Resonance (NMR) analysis to detect quantitative trait loci (QTLs) controlling seed tocopherol and seedling metabolite concentrations. RIL population parent L58 had a higher level of glucosinolates and phenylpropanoids, whereas levels of sucrose, glucose and glutamate were higher in the other RIL population parent, R-o-18. QTL related to seed tocopherol (α-, β-, γ-, δ-, α-⁄γ-and total tocopherol) concentrations were detected on chromosomes A3, A6, A9 and A10, explaining 11%–35% of the respective variation. The locus on A3 co-locates with the BrVTE1gene, encoding tocopherol cyclase. NMR spectroscopy identified the presence of organic/amino acid, sugar/glucosinolate and aromatic compounds in seedlings. QTL positions were obtained for most of the identified compounds. Compared to previous studies, novel loci were found for glucosinolate concentrations. This work can be used to design markers for marker-assisted selection of nutritional compounds in B. rapa.

Next to tocopherol, there are many more secondary plant metabolites in the plant metabolome that are suggested to have a nutritional effect. Of particular interest are glucosinolates, sulfur containing plant metabolites with anti-carcinogenic properties [27,28] that form a group of more than 100 plant secondary metabolites present primarily in the Brassicaceae family. Each plant species contains a blend of different glucosinolates [29,30]. This blend is largely responsible for the typical flavor and odor of Brassicaceae species plant products. There are significant differences within the Brassicaceae crop species for their glucosinolate profiles [31]. Glucosinolates are grouped into three chemical classes: aliphatic, indolic and aromatic, according to whether their amino acid precursor is methionine, tryptophan or an aromatic amino acid (tyrosine or phenylalanine), respectively [32,33]. Aliphatic glucosinolates are the most prominent glucosinolates found in Brassica vegetables [34]. The concentration and chemical structure can vary considerably, depending on the genotype, stage of development, tissue type and environmental conditions [35,36]. More than 90 different aliphatic glucosinolates have been identified among plants [30] of which up to 16 are found in B. rapa [37][38][39][40]. QTL mapping of leaf aliphatic glucosinolate loci has been carried out in two doubled haploid (DH) populations of B. rapa, which identified 16 loci controlling aliphatic glucosinolate concentration [39]. So far, 102 genes putatively involved in glucosinolate biosynthesis have been identified by comparative genomic analyses in B. rapa as the orthologues of 52 of such genes in A. thaliana [41].
To get unambiguous structural information about a metabolite, Nuclear Magnetic Resonance (NMR), and, particularly, proton NMR ( 1 H NMR analysis), probably among the most common methods, as it is a non-destructive method and can simultaneously detect all proton-bearing compounds [42]. Although it has a lower sensitivity compared to Mass Spectrometry (MS) [43], 1 H NMR spectroscopy has previously been used to uncover qualitative and quantitative differences of various cultivars of B. rapa. Different cultivars could be distinguished by elucidated metabolites, for instance, several organic and amino acids, carbohydrates, adenine, indole acetic acid (IAA), phenylpropanoids, flavonoids and glucosinolates [44].
We have used this technique to analyze the genetic variation for a range of (secondary) metabolites in B. rapa seedlings of a recently developed RIL population [45]. In addition, a targeted approach, to detect tocopherols, was used to analyze variation for these compounds in seeds of the same population. As the complete genome sequence of B. rapa is available [46], our analysis will simplify the identification of candidate genes that can be used for genetic modification or marker-assisted breeding for improved nutritional quality of B. rapa.

Seed Tocopherol Concentrations
We analyzed seeds of the parental lines, L58 and R-o-18, and all individual lines of the L58 × R-o-18 RIL population [45] for tocopherol content (Table 1, Figure 1). L58 showed higher levels than R-o-18 for α-, γ-and total tocopherol. Some lines showed a very high α-tocopherol concentration in comparison to the other components. Transgression beyond the parental values was observed for all measured tocopherols, except δ-tocopherol (Figure 1), suggesting both parents to contain both positive and negative alleles of genes involved in tocopherol biosynthesis. An example of this is the contrasting alleles found at the two major QTLs for α and total tocopherol, respectively, on A6 and A9. This observation also indicates a potential for improvement of vitamin E content and tocopherol composition through classical breeding, by combining both positive alleles in one genotype.  From left to right and top to the bottom: α-tocopherol (mg/g); β-tocopherol (mg/g); γ-tocopherol (mg/g); δ-tocopherol (mg/g); total tocopherol (mg/g); α/γ tocopherol ratio.

QTL Analysis of Seed Tocopherol Concentrations
Significant variation was observed for all tocopherol components, as indicated by the broad sense heritability (Table 3). Each tocopherol component was subjected to QTL analysis, and QTL related to α-, β-, γ-, δ-, α-/γor total tocopherol concentrations were detected on chromosomes A3, A6, A9 and A10 (Table 3, Figure 2). About 45% of the phenotypic variance for α-tocopherol was explained by two QTLs (Al1 and Al2, respectively, on chromosomes A9 and A6). Two QTLs were found for total tocopherol (Toc1 and Toc2), explaining almost 42% of the tocopherol variance. Toc2 co-located with Al1, but Toc1 did not co-locate with Al2, although both mapped to A6. Instead, it co-located with the Ga1 locus for γ-tocopherol. The QTL for δ-tocopherol (De2) mapped to the same region of A9 to which also Al1 and Toc2 were mapped. This region also contains a strong seed coat color QTL (SC1) [45]. The seed color locus, SC1, probably encodes for the CCR1 gene, a gene involved in lignin biosynthesis [47]. Since there is no reason to suggest a common biochemical basis of biosynthesis of tocopherol and the flavonoids contributing to seed color, a close linkage of different genes, rather than one common gene with pleiotropic effects, is the most likely explanation for this co-location.
As the α-tocopherol concentration is highly positively correlated to the total tocopherol concentration and two of their respective QTLs (Al1 and Toc2) map to the same position, the concentration of α-tocopherol, and not of the intermediate γ-tocopherol, appears to give the major contribution to the overall tocopherol concentration. However, the second Toc locus (Toc1) co-locates with the Ga1 QTL for γ-tocopherol on A6. This means that QTL for both tocopherols with the highest concentrations make a major contribution to the genetic variation for total tocopherol concentrations. The absence of a significant correlation between α-, γ-and δ tocopherol concentrations and the finding that these are controlled by different QTL indicates their independent genetic regulation, which is in agreement with findings of Marwede et al. [15] in canola (B. napus). Thus, with three independent loci controlling α-and γ-tocopherol, it should be possible to enhance the concentration of both. This will have a negative effect on δ-tocopherol concentration though, since the co-locating De2 and Al1 loci have opposite allele effects. As there are RILs with separated contrasting alleles in this population, we could verify this expectation. The tocopherol analyses of these lines confirm our prediction. A similar antagonistic effect was seen for soybean, where overexpression of the AtVTE3 gene, encoding the tocopherol biosynthetic enzyme, 2-methyl-6-phytylbenzoquinol methyltransferase, causes a decrease in seed β-and δ-tocopherol with a proportionate increase in α-and γ-tocopherol [5]. The combination of Al1 and Toc2 alleles from the R-o-18 parent leads to the highest tocopherol concentration in this population. Table 3. Quantitative trait loci (QTL) related to tocopherol concentration in seeds of the B. rapa L58 × R-o-18 RIL population, including 160 genotypes. "Peak position" indicates the location of the highest LOD score for each QTL. Flanking markers shows marker names flanking the QTL confidence interval based on a one LOD interval. "% Expl. var." is the percentage of total phenotypic variance explained by individual QTLs. The allelic effect of each QTL is indicated (effect), which is calculated as µA-µB (µ = mean), where A and B are RILs carrying L58 and R-o-18, respectively, alleles at the relevant QTL position. Effects are given in mg/g or without unit (for the ratio of α/γ tocopherol). H 2 is broad sense heritability. For all traits, four replicate samples were measured. For βand δ-tocopherol, values were very small to calculate the difference.

NMR Results of Seedling Metabolites Detection
To further assess the variation in metabolites present in the B. rapa RIL population, we performed NMR analysis on young seedlings. Usually, an NMR spectrum consists of hundreds of signals. Among these, 17 compounds in the organic/amino acid, sugar/glucosinolate and aromatic regions of the NMR spectra could be annotated by 1 H-NMR and confirmed their structures using 2D NMR spectroscopy. 1 H-NMR data of RIL seedling metabolites were subjected to principal component analysis (PCA) ( Figure 2). The score plot of the 1 H-NMR spectra showed that the two parental lines were quite distinct, especially in principal component 2 (PC2), which was mainly composed of progoitrin, phenylpropanoids and organic compounds. PC1 mostly corresponded to neoglucobrassicin.  L58 had a higher concentration of glucosinolates and phenylpropanoids, whereas the concentrations of sucrose, glucose and glutamate were higher in R-o-18. The major phenylpropanoid was sinapoyl glucose. Correlation analysis showed that the concentrations of several seedling metabolites were highly positively correlated (Table 4).

QTL Analysis of Seedling Metabolites
Genetic analysis of 238 signals detected in the NMR spectra enabled the identification of QTL for 146 signals (Table 5, Figure 3). A strong QTL for a compound belonging to the phenylpropanoids was mapped on A7, explaining 43% of the phenotypic variance. Six QTL contributing to variation for alanine, asparagine, glutamine, isoleucine, threonine and valine were detected, explaining up to 37% of the variance. QTL analysis of the glucosinolate NMR signals detected several significant loci, with the most significant one on A9 for neoglucobrassicin. In total, six QTLs for glucosinolates (progoitrin and neoglucobrassicin) were mapped to A3, A5, A9 and A10, with the ones mapping to A3 and A5 possibly co-locating. Previously, five B. rapa QTLs related to progoitrin were mapped to chromosomes A1, A3, A4, A8 and A10 [39] using a DH population made from different parents compared to the parents we used to generate the tested RIL population. The authors used forty-day-old leaves for metabolite analysis, while we used young seedlings, which may even still carry glucosinolates originally present in the seed. Therefore, the differences in population and sampled material are considerable, which are the most likely reasons for the differences in detected loci. In any cases, the QTL on A5 and A9 for progoitrin concentration are new loci that have not been reported previously. Extensive studies on aliphatic glucosinolates in A. thaliana previously identified genes encoding AOP (2-oxoglutarate-dependent dioxygenase) and MAM (methyl-thioalkylmalate synthase), controlling the modification of side-chain moiety and elongation, respectively, as important factors contributing to genetic variation for glucosinolate concentration and composition [48][49][50][51][52]. The regulation of aliphatic glucosinolate biosynthesis enzymes is controlled in Arabidopsis by the R2R3 myb-like transcription factors, MYB28 and MYB29 [53]. The B. rapa orthologues of MYB28 were mapped on A3, A9 and A2, and the orthologues of MYB29 were mapped on A10 and A3 [41]. The progoitrin QTL presented on A3 with a peak position at 95 cM, co-located with the map positions of MYB28/MYB29; there is also a MAM gene in this region.  QTL for the essential amino acids, isoleucine and valine, are co-located on A3 and A4. The isoleucine biosynthesis pathway runs almost parallel to valine biosynthesis, except for its first steps, which involve a threonine deaminase and dehydratase. These loci possibly correspond to the genes encoding the biosynthetic threonine dehydrates (TD) isozyme, similar to what has been isolated from tomato and potato [54,55]. Arabidopsis gene AT3G10050, the threonine dehydratase biosynthetic gene, has the syntenic paralog in B. rapa on A3, where isoleucine and valine QTL co-located [56]. Non-essential amino acids, such as alanine, asparagine and glutamine, are equally important as the essential amino acids in our body. Eight QTL for non-essential amino acids were identified in this RIL population. These were all independent, except for one QTL on A7, which was shared between alanine and glutamine. At this same region on A7, also, one of the glutamate QTL was mapped. As glutamate is the substrate for glutamine synthesis and the α-amino group of glutamate can be transferred to pyruvate to form alanine [57], this locus is likely to contain a gene involved in the regulation of all three compounds, which is most likely in the upstream, common part of their biosynthesis pathway.

Plant Material
The RIL population was derived from a cross between two genotypes belonging to two distinct morphotypes, Cai Xin and Yellow Sarson; both early flowering and self-compatible. The Cai Xin parent is L58, a vegetable type originating from China (B. rapa ssp. parachinensis). The other parent, R-o-18, is a doubled haploid Yellow Sarson oil type line (B. rapa ssp. trilocularis) originating from India. This population has been described [45].

Seed Preparation for HPLC
F7 seeds derived from one plant per RIL of the L58 × R-o-18 population were used for tocopherol measurement (two replicate plants per line with two technical replicates from each plant). For the tocopherol extraction, 10-40 mg seeds were ground in 2-mL reaction tubes with a Geno/Grinder 2000 (SPEX-Sample Prep, Metuchen, NJ, USA) using n-heptane and 3.0-4.0 mm metal beads. The samples were incubated at −20 °C for 2 h. Further applications and HPLC analyses were performed as described [58][59][60]. Quantification of the tocopherols was done by fluorescence detection (excitation at λ = 290 nm, emission at λ = 328 nm). To identify the individual tocopherols, the retention times were compared with standard substances from Merck's tocopherol kit (Merck, Darmstadt, Germany). Total tocopherol content was calculated as the sum of α-, β-, γ-and δ-tocopherol.

Seedling Preparation for NMR Analysis
Thirty seeds per RIL of the B. rapa L58 × R-o-18 were used. Seeds were surface sterilized with 70% ethanol (v/v) for 30 s, followed by agitation for 5 min in sodium hypochlorite (2.0% active chlorite). After three rinses in sterile distilled water, 30 seeds of each individual (for every experiment) were placed in 15 × 90 mm petri dishes, each containing 20-25 mL half strength MS salts and vitamins, without sucrose and solidified with 0.8% (w/v) agar. Petri dishes were placed vertically in a growth chamber maintained at 25 °C with a 16 h light/8 h dark photoperiod at a light intensity of 60 mEm -2 s -1 . Five-day-old seedlings without roots were harvested and freeze-dried. 20 mg seedlings (dry weight) were extracted with a mixture of 500 μL methanol-d 4 and 500 μL D 2 O (KH 2 PO 4 buffer, pH 6.0) containing 0.05% TSP (trimethyl silyl propionic acid sodium salt, w/v) by ultra-sonication for 20 min. After centrifugation, 800 μL supernatant was transferred to an NMR tube. 1 H NMR spectra were recorded at 25 °C on a 600 MHz Bruker AV600 spectrometer equipped with a cryoprobe, operating at a proton NMR frequency of 600.13 MHz. CD 3 OD was used as the internal lock. Each 1 H NMR spectrum consisted of 128 scans using the following parameters: TD = 51,200, spectrum width = 16.02 ppm, 0.25 Hz/point, pulse width (PW) = 30° (6.6 μs), acquisition time = 1.70 s. and relaxation delay (RD) = 2.0 s. A pre-saturation sequence was used to suppress the residual H 2 O signal with low power selective irradiation at the H 2 O frequency at μ 4.869 (2915.9 Hz) by 60.59 dB during the recycle delay. Free Induction Decays (FIDs) were Fourier transformed with LB = 0.3 Hz, and the spectra were zero filled to 32 K points. The resulting spectra were manually phased, baseline corrected and calibrated to TMSP at 0.0 ppm, using Topspin (version 2.1, Bruker).
The 1 H NMR spectra were automatically reduced to an ASCII file. Spectral intensities were scaled to the internal standard (TSP) area and reduced to integrated regions of equal width (0.04 ppm) corresponding to the region of δ 0.3-δ 10.0. The regions of δ 4.75-δ 4.90 and δ 3.28-δ 3.34 were excluded from the analysis, because of the residual signals of HDO and CD 3 OD, respectively. Bucketing was performed by AMIX software (Bruker). Principal component analysis (PCA) was performed with the SIMCA-P software (v. 12.0, Umetrics, Umea, Sweden) with scaling based on the Pareto method.

QTL Analysis
The genetic map was constructed using JoinMap 4.0 [45,61]. MAPQTL6.0 [61] was used for QTL analysis. First, the interval mapping procedure was performed to detect major QTL. For each trait, a 1000× permutation test was performed to calculate the LOD threshold corresponding to a genome-wide false discovery rate of 5% (p < 0.05). Markers with LOD scores equal to or exceeding the threshold were used as cofactors in multiple-QTL-model (MQM) mapping. If new QTLs were detected, the linked markers were added to the co-factor list, and the MQM analysis was repeated. If the LOD value of a marker dropped below the threshold in the new model, it was removed from the cofactor list, and the MQM analysis was rerun. This procedure was repeated, until the cofactor list became stable. The final LOD score for each trait was determined by restricted MQM (rMQM) mapping. In some cases, rMQM mapping showed that some cofactors should be on the same linkage group, but at slightly different positions. In that case, the new marker was selected as a cofactor and the whole procedure was repeated.

Conclusions
The detected genotypic variation in tocopherol seed concentration and seedling metabolites in the RIL population under study allowed the detection of several QTLs for these compounds. The loci we detected can be used to establish diagnostic markers for marker-assisted selection for improved nutritional quality (mainly tocopherol and glucosinolate concentrations). The further analysis of these QTLs affecting metabolic processes will increase our knowledge about the regulatory control of biosynthetic pathways.