Quantitative Trait Loci and Candidate Genes That Control Seed Sugars Contents in the Soybean ‘Forrest’ by ‘Williams 82’ Recombinant Inbred Line Population

Soybean seed sugars are among the most abundant beneficial compounds for human and animal consumption in soybean seeds. Higher seed sugars such as sucrose are desirable as they contribute to taste and flavor in soy-based food. Therefore, the objectives of this study were to use the ‘Forrest’ by ‘Williams 82’ (F × W82) recombinant inbred line (RIL) soybean population (n = 309) to identify quantitative trait loci (QTLs) and candidate genes that control seed sugar (sucrose, stachyose, and raffinose) contents in two environments (North Carolina and Illinois) over two years (2018 and 2020). A total of 26 QTLs that control seed sugar contents were identified and mapped on 16 soybean chromosomes (chrs.). Interestingly, five QTL regions were identified in both locations, Illinois and North Carolina, in this study on chrs. 2, 5, 13, 17, and 20. Amongst 57 candidate genes identified in this study, 16 were located within 10 Megabase (MB) of the identified QTLs. Amongst them, a cluster of four genes involved in the sugars’ pathway was collocated within 6 MB of two QTLs that were detected in this study on chr. 17. Further functional validation of the identified genes could be beneficial in breeding programs to produce soybean lines with high beneficial sucrose and low raffinose family oligosaccharides.


Sugar QTL Detection
WinQTL Cartographer [33] interval mapping (IM) and composite interval mapping (CIM) methods were used to identify QTLs that control seed sugar (sucrose, stachyose, and raffinose) contents in this RIL population. The following parameters were used: Model 6, 1 cM step size, 10 cM window size, 5 control markers, and 1000 permutations. Furthermore, chromosomes were drawn using MapChart 2.2 [34].

Sugars Biosynthesis Candidate Genes' Identification
The Glyma numbers of the sucrose and RFO biosynthesis genes were obtained via reverse BLAST of the genes underlying the RFO pathway in Arabidopsis using the available data in SoyBase. The sequences of the Arabidopsis genes were obtained from the Phytozome database (https://phytozome-next.jgi.doe.gov; accessed on 15 August 2023). These sequences were used for Blast in SoyBase. The obtained genes that control the RFO pathway were mapped to the identified sugars' QTLs.

Expression Analysis
The expression analysis of the identified candidate genes was performed using the publicly available data from SoyBase [20] to produce the expression profiles of these candidate genes in the soybean reference genome Williams 82 in the Glyma1.0 Gene Models version.

Comparison of the Williams 82 and Forrest Sequences
Sequences of Forrest and Williams 82 cv. were obtained from the variant calling and haplotyping analysis, which was performed using 108 soybean germplasm sequenced lines as described previously [35].

Sugar Frequency Distribution
The frequency distributions among sucrose, raffinose, and stachyose contents were quite different in the F × W82 RIL population based on the Shapiro-Wilk method for the normality test. Raffinose (2018), stachyose (2018), and sucrose (2020) were normally distributed. Only positive or negative skewness were identified in the RIL population, and all kurtosis values of these variables were positive (Table 1; Figure 1). In terms of coefficient of variation (CV), the value of sucrose 2018 showed the highest percentage of variation (62.86%) compared to other assessed traits, and the rest of the CVs appeared to be less varied within these two years. The histogram of sucrose 2018 was extremely skewed, and the other traits evaluated were normally distributed. Table 1. Seed sugar contents' means, ranges, CVs, skewness, and kurtosis in the F × W82 RIL population evaluated in Spring Lake, NC (2018) and Carbondale, IL (2020). Mean and range values are expressed in µg/g of seed weight. ** p < 0.01, *** p < 0.001.   The broad-sense heritability (h 2 b ) of seed sugar weight for sucrose, raffinose, and stachyose contents across two different environments appeared quite different. Stachyose had the highest heritability (92%), and the h 2 b for sucrose was 36.8% (Table 2). However, no negative h 2 b values for sugar contents were observed. The RIL-year interactions still played a significant role in the molecular formation among the three sugar contents in soybean seeds. The Sum Sq and Mean Sq to determine σ G 2 and σ GE 2 for each trait (Table 2) using the type I sum of squares (ANOVA (model)) function in the R program were implemented.

Association between the Identified Candidate Genes and the Previously Reported QTLs
Several studies have identified and mapped QTLs underlying the seed sugar content using different populations and mapping methods [39][40][41][42], as summarized in [18].

Organ-Specific Expression of the Identified Candidate Genes
The expression pattern of the identified candidate genes was investigated in Williams 82 cv. using the RNA-seq data available in SoyBase [20]. The dataset includes several plant tissues, including leaves, nodules, roots, pods, and seeds ( Figures 3A,B and S2). Four of the fifty-seven identified candidate genes have no available RNA-seq data, including the sucrose synthase candidate genes Glyma.03G216300, Glyma.17G045800, and Glyma.19G212800, as well as the UDP-D-glucose-4-epimerase candidate gene Glyma.18G211700 ( Figure S2). The raffinose synthase candidate gene Glyma.04G145800 was not expressed in any of the analyzed tissues, whilst the rest of the identified genes showed different expression patterns.
The sucrose synthase candidate genes Glyma.09G073600 and Glyma.13G114000 presented a high expression profile in all the analyzed tissues except for the young leaves, while the raffinose synthase candidate gene Glyma.17G111400 was abundantly expressed in all the analyzed tissues except for the seeds and nodules. Interestingly, the sucrose synthase candidate gene Glyma.15G182600 was highly expressed in all the tissues excluding the young leaves and the nodules. The raffinose synthase candidate gene Glyma.03G137900 was abundantly expressed in flowers, nodules, and roots. The raffinose synthase candidate gene Glyma.14G010500 and the invertase candidate gene Glyma.05G236600 were highly expressed in the flowers and pods. Also, the UDP-D-glucose-4-epimerase candidate gene Glyma.05G204700 was abundantly expressed in the flowers and seeds. While the invertase candidate gene Glyma.20G177200 was highly expressed in nodules and roots, the raffinose synthase candidate gene Glyma.06G179200 was found to be highly expressed in seeds ( Figures 3A and S2).
Seventeen of the identified candidate genes were situated less than 10 MB away from the identified QTL regions. Glyma.09G073600 was highly expressed in seeds in Williams 82 cv., followed by Glyma.17G111400, Glyma.17G035800, and Glyma.08G043800 with a moderated expression profile. The remaining genes had lower expression patterns, excluding the Glyma.02G016700, Glyma.06G175500, Glyma.09G016600, Glyma.10G017300, and Glyma.19G004400 genes, which were not expressed in seeds in Williams 82 cv. Plants 2023, 12, x FOR PEER REVIEW 13 of 20  [20]. RNA-seq data are not available in Soybase for the Glyma.17G045800 candidate gene.

Comparison of the Williams 82 and Forrest Sequences
The sequences of the candidate genes located less than 10 MB from the identified QTLs were compared. The results showed that six of them had SNPs and InDels between  [20]. RNA-seq data are not available in Soybase for the Glyma.17G045800 candidate gene.

Comparison of the Williams 82 and Forrest Sequences
The sequences of the candidate genes located less than 10 MB from the identified QTLs were compared. The results showed that six of them had SNPs and InDels between the Forrest and Williams 82 sequences: Glyma.09G073600, Glyma.08G143500, Glyma.05G003900, Glyma.17G035800, Glyma.17G111400, and Glyma.09G016600 (Table S4, Figure 4).
the Forrest and Williams 82 sequences: Glyma.09G073600, Glyma.08G143500, Glyma.05G003900, Glyma.17G035800, Glyma.17G111400, and Glyma.09G016600 (Table S4, Figure 4). The sucrose synthase Glyma.09G073600 had in total 28 SNPs and 7 InDels; three of these SNPs were located upstream of the 5′UTR, two are downstream of the 3′UTR, and seven were located in the exons (Table S4, Figure 4). For the invertase candidate gene Glyma.08G143500, there were 20 SNPs and 5 InDels. One of these SNPs was located in exon 7, causing a missense mutation, and two SNPs were located upstream of the 5′UTR (Table S4, Figure 4). The raffinose synthase candidate gene Glyma.05G003900 had nine SNPs and one InDel; four of those SNPs were in the exons, from which two SNPs resulted in missense mutations (Table S4, Figure 4). Likewise, the raffinose synthase candidate gene Glyma.09G016600 possessed 12 SNPs and 2 InDels. Amongst these SNPs, there were two located in exons, which resulted in missense mutations, in addition to the two InDels located in the exons (Table S4, Figure 4). For the raffinose candidate gene Glyma.17G111400, eight SNPs were found, of which one was located upstream of the 5′ UTR, another one was downstream of the 3′UTR, and the last six were in exons causing silent mutations (Table S4, Figure 4). Finally, the UDP-D-Glucose-4-Epimerase candidate gene Glyma.17G035800 had two SNPs that were positioned in introns (Table S4).

Discussion
Soybean seed sugars play a major role in seed and fruit development. Recently, soy products became very popular as a result of a growing demand for vegan diets [45]. The The sucrose synthase Glyma.09G073600 had in total 28 SNPs and 7 InDels; three of these SNPs were located upstream of the 5 UTR, two are downstream of the 3 UTR, and seven were located in the exons (Table S4, Figure 4). For the invertase candidate gene Glyma.08G143500, there were 20 SNPs and 5 InDels. One of these SNPs was located in exon 7, causing a missense mutation, and two SNPs were located upstream of the 5 UTR (Table S4, Figure 4). The raffinose synthase candidate gene Glyma.05G003900 had nine SNPs and one InDel; four of those SNPs were in the exons, from which two SNPs resulted in missense mutations (Table S4, Figure 4). Likewise, the raffinose synthase candidate gene Glyma.09G016600 possessed 12 SNPs and 2 InDels. Amongst these SNPs, there were two located in exons, which resulted in missense mutations, in addition to the two InDels located in the exons (Table S4, Figure 4). For the raffinose candidate gene Glyma.17G111400, eight SNPs were found, of which one was located upstream of the 5 UTR, another one was downstream of the 3 UTR, and the last six were in exons causing silent mutations (Table S4, Figure 4). Finally, the UDP-D-Glucose-4-Epimerase candidate gene Glyma.17G035800 had two SNPs that were positioned in introns (Table S4).

Discussion
Soybean seed sugars play a major role in seed and fruit development. Recently, soy products became very popular as a result of a growing demand for vegan diets [45]. The quality and taste of these products are determined by the soybean seed sugar content [39].
Given the importance of the soybean seed sucrose content for the quality of soybeanbased products for food and feed, breeding programs are focused on developing soybean seeds with a high sucrose content and low RFO content [43,46]. Thus, soybean varieties with a high sucrose content are valuable for soybean food and feed products [47].
The identification of QTLs associated with sugar components using different types of molecular markers is one of the breeding-process approaches that researchers use to breed for a high-sucrose soybean. In soybean and other crops, it is well established that seed sugar contents are complex polygenic traits, and many studies including this study have mapped QTLs for sugar contents using various mapping populations including biparental populations where parents may not necessarily have contrasting amounts of sugar contents, such as in the "MD96-5722" by "Spencer" RIL population [30].
The SNP-based genetic linkage map facilitated the identification of several QTLs including QTLs for seed isoflavone contents [28], seed tocopherol contents [29], and seed sugar (sucrose, stachyose, and raffinose) contents, as reported in the current study.
The heritability (H 2 ) of sucrose, stachyose, and raffinose was estimated to be 37.5%, 73.9%, and 92%, respectively. There is no doubt that the environment and the interactions of genotype and environment play a major role in the heritability of traits [28,29,43,54,55]. A trait biosynthesis that involves several genes is expected to have a lower heritability than a trait biosynthesis that involves fewer genes. Figure 2 shows the number of potential genes that are involved in sucrose biosynthesis versus those involved in raffinose and stachyose; it seems like there is a correlation between the heritability values and the number of genes involved in the biosynthesis pathway.
Further studies are needed to characterize these genes, identify their enzymes and protein products, and understand their roles in the sugar biosynthetic pathway in soybean.

Conclusions
In summary, we have identified 26 QTLs associated with the seed sugar contents and 57 candidate genes involved in the sucrose, raffinose, and stachyose biosynthetic pathway. Amongst these candidate genes, 16 were located less than 10 MB away from the QTL regions identified in this study.
Five QTL regions were commonly identified in the two environments, NC and IL, on chrs. Five genes (Glyma.09G073600, Glyma.08G143500, Glyma.17G111400, Glyma.05G003900, and Glyma.09G016600) have SNPs and InDels between the Forrest and Williams 82 sequences. These SNPs could potentially explain the difference in sugar content between Forrest and Williams 82 cultivars.
Further studies are required to functionally characterize these genes so we can understand and validate their roles in the sugar biosynthetic pathway in soybean before they are used in breeding programs to produce soybean lines with high beneficial sucrose and low RFOs.