QTL and Candidate Genes for Seed Tocopherol Content in ‘Forrest’ by ‘Williams 82’ Recombinant Inbred Line (RIL) Population of Soybean

Soybean seeds are rich in secondary metabolites which are beneficial for human health, including tocopherols. Tocopherols play an important role in human and animal nutrition thanks to their antioxidant activity. In this study, the ‘Forrest’ by ‘Williams 82’ (F×W82) recombinant inbred line (RIL) population (n = 306) was used to map quantitative trait loci (QTL) for seed α-tocopherol, β-tocopherol, δ -tocopherol, γ-tocopherol, and total tocopherol contents in Carbondale, IL over two years. Also, the identification of the candidate genes involved in soybean tocopherols biosynthetic pathway was performed. A total of 32 QTL controlling various seed tocopherol contents have been identified and mapped on Chrs. 1, 2, 5, 6, 7, 8, 9, 10, 12, 13, 16, 17, and 20. One major and novel QTL was identified on Chr. 6 with an R2 of 27.8, 9.9, and 6.9 for δ-tocopherol, α-tocopherol, and total tocopherol content, respectively. Reverse BLAST analysis of the genes that were identified in Arabidopsis allowed the identification of 37 genes involved in soybean tocopherol pathway, among which 11 were located close to the identified QTLs. The tocopherol cyclase gene (TC) Glyma.06G084100 is located close to the QTLs controlling δ-tocopherol (R2 = 27.8), α-tocopherol (R2 = 9.96), and total-tocopherol (R2 = 6.95). The geranylgeranyl diphosphate reductase (GGDR) Glyma.05G026200 gene is located close to a QTL controlling total tocopherol content in soybean (R2 = 4.42). The two methylphytylbenzoquinol methyltransferase (MPBQ-MT) candidate genes Glyma.02G002000 and Glyma.02G143700 are located close to a QTL controlling δ-tocopherol content (R2 = 3.57). The two γ-tocopherol methyltransferase (γ-TMT) genes, Glyma.12G014200 and Glyma.12G014300, are located close to QTLs controlling (γ+ß) tocopherol content (R2 = 8.86) and total tocopherol (R2 = 5.94). The identified tocopherol seed QTLs and candidate genes will be beneficial in breeding programs to develop soybean cultivars with high tocopherol contents.


Introduction
Tocopherols and tocotrienols collectively constitute the tocochromanols family, known as Vitamin E. Tocochromanols are fat-soluble phenolic compounds, synthesized by photosynthetic organisms. In soybean, vitamin E is present almost exclusively as tocopherols. Tocopherols exist in four isoforms, α-tocopherol, γ-tocopherol, β-tocopherol, and δ-tocopherol which differ from each other by the number and the location of the methyl groups. αtocopherol possesses three methyl groups, followed by γ-tocopherol and β-tocopherol that have two methyl groups, and finally δ-tocopherol with only one methyl group [1].
Tocopherols have an important role in human and animal nutrition thanks to their vitamin E activity. However, from a nutritional perspective among the four tocopherol isoforms, α-tocopherol is the most important due to the high vitamin E activity [2]. It also has been reported to play a role in the prevention of cardiovascular diseases and cancer [3,4]. In the human body, α-tocopherol is preferentially accumulated due to its affinity with the liver α-tocopherol transfer protein (α-TTP), which enriches plasma with α-tocopherol [5].
Soybean (Glycine max Merr.) is not only one of the main sources of vegetable oil and animal feed worldwide, but also used for production of biofuel, aquaculture feed, and as source of protein for the human diet due to a high protein (40-42%) and oil contents (18-22%) [6], which make it an important crop worldwide.
Soybean seeds are rich in secondary metabolites beneficial for human health including tocopherols. Total tocopherol content is relatively high in soybean seeds compared to other oilseeds crops, and γ-tocopherol is the predominant form, while α-tocopherol content is less than 10% of the total tocopherol content [3,4,7].
Since soybean oil provides 30% of the total worldwide oil consumption and~70% of the vitamin E in the American diet comes from soybean oil, developing soybean cultivars with high seed α-tocopherol contents could have tremendous positive effects on the health benefits associated with eating soybeans and their market value.
In soybean, tocopherol seed content and composition vary from one cultivar to another, being controlled by several genetic and environmental factors. These factor make it one of the most complex quantitative traits [8] and many studies have focused on investigating the genetic and molecular factors underlying this trait [9][10][11][12].
Tocopherol biosynthesis takes place at the plastid's envelope, where a combination of two precursors derived from different pathways occurs. The homogentisic acid (HGA), a product of the cytosolic shikimate pathway, is used to form the aromatic ring of tocopherols, while phytyl diphosphate (PDP), a product of either the methylerytrithol phosphate (MEP) pathway or the phytol recycling pathway [15], forms the prenyl tail. The condensation of these two precursors is catalyzed by the homogentisate phytyl transferase (HPT) and creates 2-methyl-6-phytyl-1,4-benzoquinol (MPBQ), which can be further methylated by MPBQ methyltransferase (MPBQ-MT) to 2,3-dimethyl-6-phytyl-1,4-benzoquinone (DMPBQ). The cyclization of the MPBQ and DMPBQ by the tocopherol cyclase produces γ-tocopherol and δ-tocopherol, respectively. The conversion of γ-tocopherol and δ-tocopherol to αtocopherol and β-tocopherol is catalyzed by the γ-tocopherol methyltransferase (γ-TMT) and it represents the last step of the tocopherol biosynthesis pathway.
Many studies have elucidated the genes involved in tocopherol pathway in Arabidopsis. The tyrosine produced by the shikimate pathway is catalyzed by Tyrosine Aminotransferase (TAT) resulting in the formation of P-hydroxy Phenyl Pyruvate (HPP). The HPP will be catalyzed by p-hydroxyphenylpyruvate dioxygenase (HPPD) to produce the homogentisic acid, this enzyme is controlled by PDS1. In A. thaliana, mutants of pds1 have shown a lack of tocopherols and plastoquinone with a lethal photobleached phenotype, this result showed the importance of PDS1 in the tocopherol biosynthesis pathway [16]. The overexpression of the PDS1 gene in tobacco leaves or in A. thaliana seeds only gave moderately increased tocopherol concentrations [17,18]. The phytyl diphosphate (PDP) can be derived, either from the MEP pathway after reduction of geranylgeranyl diphosphate (GGDP) by the Geranylgeranyl Diphosphate Reductase (GGDR) enzyme, or from the phytol recycling pathway. Many studies have investigated the phytol recycling pathway and have shown that mutants of vte5-1 are devoid of phytol kinase. Also, vte5-1 mutants have shown a reduction in total tocopherol content in seeds and leaves with 80% and 65% respectively, compared to the wild type [19]. The VTE5 gene controls the phytol kinase that catalyzes the phytol phosphorylation producing Phytolmonophosphate which is catalyzed by Phytolphosphate kinase VTE6 leading to phytyl diphosphate (PDP) formation. Arabidopsis vte6 mutants have shown tocopherol deficiency in leaves and a reduction in plant growth and seed longevity. The overexpression of the VTE6 gene resulted in a two-fold increase in PDP that resulted in higher γ-tocopherol accumulation in seeds [20]. Homogentisate phytyl transferase (HPT) catalyzes the condensation of HGA and PDP to produce 2-methyl-6-phytyl-1,4-benzoquinone (MPBQ). In Arabidopsis, the HPT enzyme is encoded by the VTE2 gene [21,22].
The Arabidopsis devoid of VTE2 have shown a complete deficiency in all tocopherol derivatives and all pathway precursors, which means that this is a crucial step in the tocopherol biosynthetic pathway [23]. The MPBQ-MT enzyme is encoded by the VTE3 gene, which is a limiting step in producing αand γ-tocopherol. In Arabidopsis, vte3-2 mutants were lacking in αand γ-tocopherol and exhibited a pale green phenotype, abnormal chloroplasts and did not survive beyond the seedling stage [24,25].
The VTE4 gene encodes the γ-tocopherol methyltransferase (γ-TMT) that catalyzes the methylation of the γ-tocopherol and δ-tocopherol to produce α-tocopherol and βtocopherol, respectively. The co-expression of both At-VTE3 and At-VTE4 in soybean showed an accumulation of >95% of α-tocopherol, in addition to a 5-fold increase of seed vitamin E activity [28].
In this study, the genetic factors associated with tocopherol content in soybean were investigated, QTL for seed α-tocopherol, β-tocopherol, δ-tocopherol, γ-tocopherol, and total tocopherol contents were mapped, the link between the biosynthesis genes for tocopherol and soybean seed tocopherol content was studied, and the in-Silico tocopherol biosynthetic pathway in soybean was reconstructed.

The SNP-Based Genetic Map
The SNP-Based genetic map used in this study was described previously and identified QTLs that control seed isoflavone contents [31]. The map which covered 4029.9 cM, was composed of 2075 SNP markers, and was based on 306 RILs of F × W82 [31].

Tocopherol Contents Frequency Distribution, Heritability, and Correlation
The frequency distributions among different tocopherol contents were not always normal in the FxW82 RIL population based on Shapiro-Wilk's method for normality test.
Only total-tocopherol 2017 (T-Toc-2017) and δ-tocopherol 2020 (δ-Toc-2020) were normally distributed. The positive or negative skewness and kurtosis value (>3) were also identified in the RIL population (Table 1; Figure 1). Each tocopherol component also showed a different degree of variation in the parameters of traits, and the variability appeared to not be greatly impacted by different environments. α-tocopherol 2017 (α-Toc-2017), displayed the highest coefficient of variation (CV) value (83.64%); however, the CV of (α-Toc-2020) was 55.49% indicating that phenotypic variability among tocopherol contents was constant year over these 2 years.
The broad sense heritability (h 2 ) of (µg/g of dry seed weight) for seed α-tocopherol (α-Toc), δ-tocopherol (δ-Toc), γ+β-tocopherol ((γ+β)-Toc), and total tocopherol (T-Toc) contents (in µg/g of dry seed weight) across two different years appeared to be quite diverse. δ-Toc had the highest heritability (71%) and the h 2 for T-Toc was 41% (Table 2). However, the h 2 values for (γ+β)-Toc and α-Toc were negative (−41% and −61%, respectively) implying that there was biologically meaningful phenotypic repulsion among these traits. The high heritability of seed δ-Toc contents suggested that a large portion of phenotypic variation could be detected in the mapped QTL. The RILs-Year interactions still played a significant role in the molecular formation among tocopherols in soybean seeds based on our two-way ANOVA analysis because the σGE 2 is relatively high (data not shown). It should be used as a parameter for trait improvement.
Due to cost effect of this undergraduate student-centered project, only technical replicates could be applied, and F value and probability could not be generated from the dataset ( Table 2). Hence, we only calculated the Sum Sq and Mean Sq to determine σG 2 and σGE 2 for each trait (Table 2) using type I sum of squares (ANOVA (model)) function in R program but not σe 2 due to limited replicates.

In Silico Reconstruction of the Tocopherol Biosynthetic Pathway in Soybean
The tocopherol biosynthetic pathway has been investigated in the model plant Arabidopsis thaliana. The genes and compounds involved in that pathway were previously reported [33]. To reconstruct the tocopherol biosynthesis pathway in soybean, the reverse BLAST of these genes was conducted using SoyBase.

The Association between the Identified Tocopherol Pathway Candidate Genes and the Identified Tocopherol QTL
Among the identified candidate genes, 11 were located close to the identified QTLs on Chrs. 2, 5, 6, 10, 12, and 17 (Table 4, Figure 2). These candidate genes include the tocopherol cyclase candidate (TC) gene Glyma.06G084100 that is located close to seven seed tocopherol QTLs controlling -tocopherol, -tocopherol, and total tocopherol on Chr. 6 ( Table 4).

The Association between the Identified Tocopherol Pathway Candidate Genes and the Identified Tocopherol QTL
Among the identified candidate genes, 11 were located close to the identified QTLs on Chrs. 2, 5, 6, 10, 12, and 17 (Table 4, Figure 2). These candidate genes include the tocopherol cyclase candidate (TC) gene Glyma.06G084100 that is located close to seven seed tocopherol QTLs controlling δ-tocopherol, α-tocopherol, and total tocopherol on Chr. 6 ( Table 4).

Association between the Identified Candidate Genes and the Previously Reported QTL
Mapping the identified genes to the previously reported QTL regions associated with soybean seeds tocopherols was done using data from SoyBase and previous studies describing the QTL underlying tocopherol contents in soybean [7,13,34,35]. Six candidate genes were located within the identified seed tocopherol QTLs and ten were very close to some of these regions (Table 5).

Organ-Specific Expression of the Identified Candidate Genes
To investigate the role of the identified 37 candidate genes, the expression analysis of these genes was performed in Williams 82 cv. using the publicly available RNA-seq database at SoyBase (https://soybase.org/; accessed on 3 April 2022). The tissues that were included in this dataset were leaves, nodules, roots, pods, and seeds. Amongst the 37 candidate genes, no RNAseq data was available for the TC candidate gene Glyma.04G082400, the HPT candidate gene Glyma.08G274800, and the MPBQ-MT candidate gene Glyma.10G178600. The rest of the tocopherol candidate genes presented different gene expression patterns. Most of the identified candidate genes were expressed in all the analyzed tissues except for the HTP candidate gene, Glyma.03G033100, that was not expressed in any of the tissues. While the two GGDR candidate genes, Glyma.05G026200 and Glyma.17G100700, the HPT candidate gene, Glyma.13G097800, and the MPBQ-MT candidate gene, Glyma.02G143700, were highly expressed in flowers. The two GGDR candidate genes Glyma.05G026200 and Glyma.17G100700, the MPBQ-MT candidate genes Glyma.02G143700, Glyma.02G002000, and Glyma.10G030600, the TC candidate gene Glyma.06G084100, and the γ-TMT candidate gene Glyma.12G014300 were abundantly expressed in leaves. The GGDR candidate genes Glyma.05G026200, Glyma.02G273800 and Glyma.17G100700, and the TAT candidate gene Glyma.12G161500 were highly expressed in seeds. The TAT candidate genes Glyma.06G235900 and Glyma.12G205900, the two GGDR candidate genes Glyma.05G026200 and Glyma.17G100700, and the MPBQ-MT candidate genes Glyma.02G143700 and Glyma.10G030600 were highly expressed in pods.

Discussion
Tocopherols are lipophilic antioxidants that are important for human health due to their ability to prevent the oxidation of unsaturated fatty acids by scavenging the free radicals and prevent cell membrane damage [13]. Soybean seeds contain the highest tocopherol concentrations among all legume species [36]. The dominant tocopherol isoform in soybean seeds is γ-tocopherol with amounts reaching up to 70% of the total tocopherol content, while α-tocopherol isoform has a lower concentration of about 10% of the total tocopherol content. The α-tocopherol isoform has the highest vitamin E activity [4] and has the highest affinity with the hepatic tocopherol transfer protein. Therefore, improving soybean seed tocopherol composition and content is crucial.

Discussion
Tocopherols are lipophilic antioxidants that are important for human health due to their ability to prevent the oxidation of unsaturated fatty acids by scavenging the free radicals and prevent cell membrane damage [13]. Soybean seeds contain the highest tocopherol concentrations among all legume species [36]. The dominant tocopherol isoform in soybean seeds is γ-tocopherol with amounts reaching up to 70% of the total tocopherol content, while α-tocopherol isoform has a lower concentration of about 10% of the total tocopherol content. The α-tocopherol isoform has the highest vitamin E activity [4] and has the highest affinity with the hepatic tocopherol transfer protein. Therefore, improving soybean seed tocopherol composition and content is crucial.
The QTL associate with δ-tocopherol explains 27.87% of the phenotype, and the one associated with α-tocopherol explains only 9.96% of the phenotype. A TC (Glyma.06G084100) gene was identified close to these QTLs, the TC enzyme is involved directly in the conversion of MPBQ to δ-tocopherol, and indirectly in the conversion to α-tocopherol (Table 4, Figure 2). α-tocopherol is the most known potent fat-soluble antioxidant, it is preferentially absorbed and accumulated in humans [38], its activity has been demonstrated in the prevention and treatment of heart disease, cancer and Alzheimer's disease [39]. Alpha-tocopherol has been designated as the most beneficial tocopherol compound among health professionals. Unfortunately, this compound is present in small amount in soybean oil when compared to sunflower, canola or corn oil [40]. Therefore, improving α-tocopherol content in soybean is a priority for the soy-industry, the identification in the two years data of the qα-Toc-2-IL-2017 (192.6-197.6 cM) and qα-Toc-3-IL-2020 (195-197 cM) that were collocated on Chr. 06 will provide an opportunity for breeding lines with high α-tocopherol.
Although previous studies have reported some soybean genes as candidates for the tocopherol biosynthesis pathway [41], the present study shows the most comprehensive analysis of the whole soybean genome, showing the potential candidate genes for the tocopherol biosynthetic pathway in soybean.
Most QTL regions that were identified in 2017 were not found in 2020 except the qα-Toc-2-IL-2017 (192.6-197.6 cM) and qα-Toc-3-IL-2020 (195-197 cM) that were collocated on Chr. 06. This could be explained by the difference in weather conditions between 2017 and 2020. In August 2017 the temperature ranged between 8 and 33.3 • C, while in August 2020 the temperature ranged between 13.3 and 32.8 • C (https://www.extremeweatherwatch. com/; accessed on 3 April 2022). It has been proven that temperature stress during all stages of development affect soybean seed tocopherol concentrations [42].
Although previous studies identified QTL regions for soybean seed tocopherol content on Chr.6, all the identified QTLs map to the region between 74.5 and 118.5 cM (Table S2) [7,34,35]. The QTL regions identified in this study on Chr.6 clusters between 173.7 and 207 cM, which is the region that encompass an important gene in the tocopherol biosynthesis pathway, namely the tocopherol cyclase candidate gene, Glyma.06G084100. This QTL on Chr.6 is responsible for 27.8% of δ-tocopherol, 9.96% of α-tocopherol, 6.16% of γ+ß-tocopherol, and 6.95% of total tocopherol content

Plant Materials
The F 6:13 'Forrest' × 'Williams 82' RIL population (n = 306) described previously was used in this study [30,43]. The parents and RILs were grown in Carbondale, southern Illinois in 2017 and 2020, and seeds were harvested at maturity of all RILs and parents.

Tocopherols Quantification
At maturity, seeds of the parents and RILs were harvested and analyzed for α-Tocopherol (α-Toc), δ-Tocopherol (δ-Toc), α+ß-Tocopherol ((γ+ß)-Toc), and total-Tocopherols (T-Toc) using a protocol developed and validated in the Nguyen Lab, the University of Missouri. Briefly, approx. 1gr. of soybean seeds were ground to fine powders with a Thomas Wiley Mini-Mill, followed by lyophilizing for 48 hrs. Approx. 200mg of powder were mixed with 2mL 200-proof ethanol and vortexed, followed by an incubation with agitation at 75 • C for 2 hrs. The products were then filtered into HPLC vials for analysis along with standard solutions of tocopherols. Quantification of tocopherols was performed by employing an external calibration curve method, in which each curve was created with the six standard solutions of 0.62, 1.25, 2.5, 5, 10, and 20 µg/mL.

DNA Isolation, SNP Genotyping, and Genetic Map Construction
DNA Isolation, SNP Genotyping, and the construction of the F×W82 genetic linkage map have been described earlier [30]. Briefly, SNP genotyping was performed with Illumina Infinium SoySNP6K BeadChips (Illumina, Inc. San Diego, CA, USA) and the genetic map was constructed with JoinMap 4.0 software with a LOD score of 3.0 and maximum distance of 50 cM as described earlier [30].

Seed Tocopherols QTL Detection
We used WinQTL Cartographer 2.5 [31] and both interval mapping (IM) and composite interval mapping (CIM) methods to identify QTL that control seed α-Toc, δ-Toc, γ+ß-Toc, and T-Toc in this RIL population; however, only QTL detected with CIM are reported here. QTL identified via IM are reported in the supplementary data section (Table S4A,B). MapChart 2.2 [32] was used to draw chromosomes with CIM tocopherols QTL locations.

Tocopherols Candidate Genes Identification
The reverse blast of the genes underlying the tocopherol pathway in Arabidopsis was conducted using the available data at SoyBase (https://soybase.org/; accessed on 3 April 2022). The sequences of the Arabidopsis genes were obtained from the Phytozome database (https://phytozome-next.jgi.doe.gov; accessed on 3 April 2022), these sequences were used for Blast in SoyBase. The obtained genes that control the tocopherol biosynthetic pathway were mapped to the identified tocopherol QTL.

Expression Analysis
The expression analysis of the identified tocopherol candidate genes that are located within or close the identified seed tocopherol QTLs was performed using the publicly available data from SoyBase (https://soybase.org/; accessed on 3 April 2022) to produce the expression profiles of these candidate genes in the soybean reference genome Williams 82 in Glyma1.0 Gene Models version.
Forrest and Williams 82 sequences of the eleven candidate genes located close to the identified QTLs were compared, and the results have shown that three of them have SNPs between the Forrest and Williams 82 sequences, Glyma.06G084100, Glyma.17G061900 and Glyma.17G100700 (Figure 4). The TC candidate gene Glyma.06G084100 has 5 SNPs in the coding sequence, one of them caused a missense mutation (T379A) (Figure 4) in addition to 12 SNPs and 2 InDels in the 5 UTR region (Table S3). The HPT candidate gene, Glyma.17G061900, has only one SNP located in the coding sequence that caused a missense mutation (G326A) (Figure 4). For the GGDR candidate gene, Glyma.17G100700, there is also only one SNP that caused a silent mutation (Figure 4). These SNPs could play a role in the difference of tocopherol content between Forrest and Williams 82 cultivars. Glyma.06G084100 is associated with the qδ-Toc  (Table 3, Figures S1 and S2). While Glyma.17G061900 and Glyma.17G100700 are associated to qT-Toc-5-(2020) on Chr. 17 (Table 2, Figures S1 and S2). These genes could be used in breeding programs or gene editing technology to develop soybean lines and cultivars that produce high amounts of the beneficial tocopherols (vitamin E) for human consumption. the expression profiles of these candidate genes in the soybean reference genome Williams 82 in Glyma1.0 Gene Models version.

Patents
Patent resulting from this work is under submission.