Genome-Wide Association Mapping for Seed Weight in Soybean with Black Seed Coats and Green Cotyledons

: The yield of soybean ( Glycine max (L.) Merr.) is based on several components, such as the number of plants per unit area, pod number per plant, number of nodes, and seed weight. Additionally, the hundred-seed weight (HSW) is an important component affecting soybean yield. The HSW trait can determine soy products meant for human consumption. In this study, we conducted genome-wide association studies with 470 accessions of black seed coats with green cotyledons and applied an online tool with publicly available genome sequencing data. The objective of the study was to identify the genomic regions in the soybean genome associated with seed weight and to identify the candidate genes in linkage disequilibrium blocks where the most signiﬁcant SNPs were located. This study identiﬁed signiﬁcant SNPs for seed weight on chromosomes 2 and 16. Furthermore, this study indicated that GmCYP78A57 ( Glyma.02G119600 ) encoded a cytochrome P450 monooxygenase may be a possible candidate gene for controlling the seed size in soybean. We assumed that another gene on chromosome 16 may play the important role of a small additive genetic effect to reduce seed size along with GmCYP78A57 . An online tool was used to identify 12 allelic variations of GmCYP78A57 with publicly available genomic sequence data. The HSW of 45 accessions having a missense mutation from the Germplasm Resources Information Network ranged from 4.4 to 17.6 g. In addition, 19 accessions were shown to be less than 10.0 g of HSW. This information can provide for the development of molecular markers to use in soybean breeding programs to release new cultivars with increased or decreased seed weight.


Introduction
Soybean (Glycine max (L.) Merr.) is one of the most economically and nutritionally important crops worldwide because it contains 40% protein, 20% oil, and 15% soluble carbohydrates in the seed. Soybean is mainly used for producing high-protein meals for livestock and vegetable oils. Its yields have slowly increased in the past decades [1,2]. The yield is determined by several components, such as the number of plants per unit area, pod number per plant, number of nodes, and seed weight [3]. The seed weight is not only an important component affecting the soybean yield, but also a determinant of soy products for human consumption [4][5][6][7]. Small-seed soybeans are used for the production of high-quality soybean sprouts and natto, whereas large-seed soybeans are preferred to produce tofu, soybean paste, edamame, and miso soup [8]. Therefore, it is important to understand the genetic basis of hundred-seed weight (HSW) in improving the potential of soybean yield and the associated soybean food quality. In addition, this understanding will provide helpful information in the soybean breeding program to develop a new cultivar.
The HSW trait is a complex and quantitatively inherited trait controlled by multiple genes with small additive genetic effects [9]. Many quantitative trait loci (QTLs) controlling the HSW of soybean have been reported through linkage analysis and genome-wide association studies (GWAS). To date, a total of 304 QTL regions of seed weight have been documented in SoyBase [10] using intraspecific and interspecific mapping populations with different genetic backgrounds, and 94 SNPs (single nucleotide polymorphism) were associated with seed weight through GWAS. Studies indicated that the HSW is highly inherited with 98% of heritability, indicating that genotypic value is an important factor controlling HSW variations in soybean seeds [11,12].
GWAS is association mapping or linkage disequilibrium (LD) mapping with unrelated individuals to detect the SNPs associated with the traits of interest, such as agronomic, seed composition, and diseases in soybeans [13][14][15][16][17][18][19]. The mapping studies have identified QTLs in many crop species, such as rice, maize, and soybean. The determinants, such as population size, genetic diversity, and genetic structure, influence the precision of QTL regions by GWAS analysis [20]. For the HSW, GWAS with different genetic backgrounds were conducted to identify significant SNPs in soybean [12,[17][18][19]. Zhang et al. [17] reported the results of GWAS analyses with 366 Chinese soybean landrace accessions to identify 39 candidate genes for the HSW trait. Zhang et al. [12] conducted GWAS for HSW with 309 plant introductions from USDA soybean germplasm collections from Maturity Groups 0 and 00. They indicated that 22 loci showed minor effects on HSW. In addition, many QTLs in multiple environments were detected by QTL mapping through intraspecific and interspecific crossing populations [21][22][23][24][25].
The genetic studies for seed size have been well-reported in Arabidopsis and rice. Several signaling pathways, including the ubiquitin-proteasome pathway, G-protein signaling, mitogen-activated protein kinase signaling, phytohormone perception, and transcriptional regulatory factors, have been shown to control seed size [26]. Recently, transcriptional regulatory factors consisting of PEAPOD2 (PPD2), kinase inducible domain interacting 8/9 (KIX 8/9), and TPL (TOPLESS) complex have been reported to negatively regulate the expression of D3-type cyclins [27]. Eigher null mutations in PPD or KIX in Arabidopsis increased the organ size, such as seed and leave [28][29][30]. Recently, Nguyen et al. [31] demonstrated that GmKIX8-1 (Glyma.17G112800) as repressor-regulated D3-type cyclins have been identified to control seed weight and leaf size of soybeans from fast neutron mutant populations. Other studies demonstrated that causal genes for seed weight and size have been reported in soybeans. Sun et al. [32] reported that overexpression of miR156 improved soybean architecture and yield with increased HSW as well as number of branches, nodes and pods. In addition, PP2C (Glyma.17G221100) encodes a putative phosphatase 2C protein that has been identified to control seed weight through linkage analysis of an intraspecific crossing population [33]. PP2C may be associated with GmBZR1 which is one of the key transcriptional factors in brassinosteroid signaling, and finally promoted seed weight and size in soybeans.
Cytochrome P450s (CYP) is involved in biochemical pathways to produce secondary metabolites and plant hormones including brassinosterioid, gibberellin, abscisic acid, and jasmonic acid [34,35]. CYP78A subfamily genes containing a single oxygenase have been identified to produce enlarged leaves, flowers, larger diameter stems and seeds in Arabidopsis and rice [34][35][36][37][38][39][40]. Some of the CYP78A genes control cell proliferation and expansion in plants. In Arabidopsis, CYP78A5 mutation had early termination of cell division in smaller organs such as petals, sepals, leaves, and stems [34]. In rice, GYP78A13 is associated with the balance between embryo and endosperm size. In addition, Miyoshi et al. [41] reported with rice that CYP78A11 is associated with the regulation of leaf development. Through a reverse genetic approach, Zhao et al. [42] investigated the role of the CYP78A gene in soybean seed size and reported that the overexpression of GmCYP78A72 produced larger soybean seeds. However, they reported that knock-down of GmCYP78A72 did not decrease the seed size, whereas silencing to three GmCYP78A genes reduced the grain size of transgenic plants. In addition, homologous CYP78A, GmCYP78A10 was reported to associate with seed size as well as pod number, plant height and branch number in soybean [43].
Soybeans possessing black seed coats with green cotyledons (BLG) have been used as traditional ingredients in medicinal treatments in China, Japan, and Korea [44]. Due to increasing consumer awareness regarding the BLG soybeans, it has become a preferred soybean as a food ingredient in South Korea. Recently, Lee et al. [45] reported a wide phenotypic variation of HSW, which ranged from 9.1 g to 49.3 g for 470 BLG accessions. The HSW of the elite soybean cultivar ranged from 18.0 to 20.0 g, whereas its wild soybean was from 3.0 to 4.0 g [46]. This suggests that BLG germplasms are a good source of materials to identify the unique, favorable, and rare alleles for the understanding of the genetic basis of HSW in soybeans.
With the recent advancements in sequencing technologies, the utilization of wholegenome sequencing is now more feasible. In addition, the cost of whole-genome sequencing has been significantly declining and has been sequenced faster with high depths, thereby being available to reveal the identification or allelic variation of genes of interest. Since the genome sequencing efforts for cultivar and wild soybeans were completed in 2010 [47,48], the amount of re-sequencing data of soybeans has risen over the last decade [49][50][51][52]. Online tools and databases have been developed with publicly available genome sequencing data, such as SoyBase [10], Phytozome [53], and SoyKB [54]. In this study, we have conducted GWAS with agronomic traits of 470 BLG accessions [45] and 6K SNPs [55], and application of online tools with publicly available genome sequencing data. The objective of this study was to identify the QTLs in the soybean genome, which are associated with HSW, and to identify the candidate genes in LD blocks where the most significant SNPs are located.

Growth Conditions of BLG Germplasm and Phenotype Collection
To conduct the GWAS analysis, 470 BLG accessions, including three cultivars (Cheongja, Cheongja 3 and Uram), formed the total population [45,55]. The 470 BLG accessions were grown at Gyeongsanbuk-do Agricultural Research and Extension, Daegu, Republic of Korea in the years 2013, 2014 and 2015 with the planting dates over the three years being 14 June, 29 May, and 15 June, respectively. The 470 BLG accessions were planted in a single row of 1 m long with a row to row spacing of 80 cm by hand. Each single row was harvested in bulk at R8 harvest maturity stage [56]. Five randomly selected plants per plot were used to measure plant height and number of nodes per plant. Harvested soybeans from each plot were measured for HSW.

DNA Extraction and Determination of Genotypes for BLG Accessions
Genotypic information from 470 BLG accessions was described in Jo et al. [38]. Young trifoliate leaves of each BLG accession were collected with three cultivars. The leaves of each accession were ground into a fine powder with mortar and pestle with liquid nitrogen. Powder (20 mg) from each sample was used to extract the genomic DNA using the cetyltrimethylammonium bromide method with a minor modification [57]. Quantification and qualification of the genomic DNA of each accession was determined by running on 1.5% agarose gel. Genomic DNA (30 µL) at a concentration of 100 ng/µL from 470 accessions were genotyped with BARCsoySNP6K BeadChip at the National Instrumentation Center for Environmental Management (NICEM; Republic of Korea) at Seoul National University [58]. A total of 5122 SNP alleles were called using the Genome Studio Genotyping Module (Illumina, Inc. San Diego, CA, USA) [59].

Genome-Wide Association Studies
A total of 4459 SNPs were used for association mapping after filtering through the TASSEL software to exclude those with >20% missing data and rare SNPs (minor allele frequency, MAF > 0.01). Therefore, principal component analysis (PCA) was constructed with 4459 SNPs [60]. A general linear model (GLM) with PCA was implemented in comparison with the result of mixed linear model (MLM) using TASSEL software and the GAPIT R package. The kinship coefficient matrix was used to provide an estimate of additive genetic variance [60,61]. In the present study, we used a MLM with PCA and kinship produced p values to populate Manhattan plots [61,62]. The significance of associations between SNPs and traits was based on false discovery rate (FDR) analyses.

Linkage Disequilibrium Estimation
Distances of SNPs and physical position were calculated using Glycine max Wm82.a2 reference genome. Pairwise LD between SNPs was calculated as the squared correlation coefficient (r 2 ) of alleles using TASSEL software. The r 2 for SNPs with pairwise distance in a window of 100 SNPs was used to draw the average LD decay figure by R script [63]. The LD decay rates of the BLG accessions were measured as the chromosomal distance where r 2 dropped to half of the maximum value [64].

Online Tool and Phenotypic Data Set from GRIN
The soybean allele catalog, as an online tool, was used to identify the allelic variation through SoyKB [65]. The input was the name of the gene of interest (GmCYP78A57, Glyma.02G119600). The list of accessions can be downloaded from the online tool. Based on the list, phenotypic data of HSW were obtained from SoyBase [66].

Statistical Data Analysis
All statistical analyses in this study were conducted in SAS v9.4 (SAS Institute, 2013). A comparison of the measured chlorophyll and anthocyanin between the two groups was determined using genotyping, and a Student's t-test analysis (p < 0.05) was conducted using PROC TTEST in SAS. Mean differences among the genotypic groups were analyzed with Fisher's Least Significant Difference (LSD) test at p = 0.05 using PROC GLM. For the correlation analysis, PROC CORR of SAS code was used.

Phenotypic Distribution of Agronomic Traits in BLG Germplasm
The phenotypic distribution of 470 BLG accessions for plant height, number of nodes, and HSW over the three years of this study were evaluated ( Figure 1). BLG accessions displayed continuous variation suggesting quantitative traits for plant height, number of nodes, and HSW. The plant height of 470 BLG accessions ranged from 49.6 to 151.6 cm, with an average height of 87.0 cm ( Figure 1A). Forty-two accessions showed less than 60.0 cm of plant height. Eighty-one of the BLG accessions belong to a category with plant height varying from 60.0 to 80.0 cm. In addition, 73.0% of the total accessions had more than 80.0 cm of plant height in this study. Three cultivars, namely: Cheongja 3, Cheongja, and Uram, were 74.2, 77.7, and 62.9 cm in plant height, respectively. The number of nodes in BLG accessions ranged from 12.3 to 25.9 ( Figure 1B). In this study, plant height was strongly positively correlated with the number of nodes (r = 0.84, p < 0.001). The HSW of BLG accessions was from 9.1 to 49.3 g with an average HSW of 33.9 g ( Figure 1C). Only one accession (BLG466, 9.1 g) was shown to be less than 10.0 g in HSW. In this study, 380 of the BLG accessions had more than 30.0 g in HSW. The HSW of Cheongja 3, Chenogja, and Uram were 38.7, 41.5, and 30.3 g, respectively. Furthermore, the phenotypic distributions of HSW were shown to be left-skewed.

GWAS with Plant Height and Number of Nodes
With a total of 4459 SNPs, a GWAS was performed with the MLM, which greatly reduced the false-positive rates, and quantile-quantile (QQ) plots (Figure 1). The summarized results of MLM analyses with plant height, number of nodes, and HSW across three years are represented in Table 1. Two overlapping SNPs on chromosome 19 across the three years were associated with plant height at 10.8 and 6.1 of −log 10 (p) value based on MLM association analysis. Six overlapping SNPs on four different chromosomes were significantly detected for the number of nodes with BLG accessions. A most significant SNP (Gm19_45204441) was colocalized for plant height and number of nodes. In the haplotype block, a candidate gene for plant height and number of nodes may be Dt1 (Glyma.19G194300), which is involved in the regulation of stem growth habits.

GWAS with Plant Height and Number of Nodes
With a total of 4459 SNPs, a GWAS was performed with the MLM, which greatly reduced the false-positive rates, and quantile-quantile (QQ) plots ( Figure 1). The summarized results of MLM analyses with plant height, number of nodes, and HSW across three years are represented in Table 1. Two overlapping SNPs on chromosome 19 across the three years were associated with plant height at 10.8 and 6.1 of −log10(p) value based on MLM association analysis. Six overlapping SNPs on four different chromosomes were significantly detected for the number of nodes with BLG accessions. A most significant SNP (Gm19_45204441) was colocalized for plant height and number of nodes. In the haplotype block, a candidate gene for plant height and number of nodes may be Dt1 (Glyma.19G194300), which is involved in the regulation of stem growth habits.

GWAS of HSW in BLG Accessions and Candidate Gene Prediction
In addition, six significant SNPs for HSW were located on chromosomes 2 and 16, respectively (p < 0.05). The most significant SNPs were Gm02_8896955 and Gm16_31822897 on chromosomes 2 and 16, respectively. For further analysis, this study indicated that genotype "A" is represented as adenine base of Gm02_8896955, whereas "a" is guanine base of Gm02_8896955. Genotype "B" indicates the adenine base of Gm16_31822897 and "b" shows the guanine base of Gm16_31822897. Based on the genotype of SNP Gm02_8896955, the HSW in genotype "AA" had 34.5 ± 8.3 g (mean ± standard deviation), which is significantly higher than the one in genotype "aa" (20.9 ± 6.2 g) (Figure 2A). The interaction of two SNPs was shown in Figure 2B. The genotype "AABB" was shown to be significantly higher than other genotypes. In addition, genotype "aabb" (17.6 ± 3.9 g) exhibited the smallest seed size in BLG accessions. Although they did not show significance between genotypes "AAbb" and "aaBB," the mean values of HSW for "AAbb" and "aaBB" were 29.2 g and 24.2 g, respectively. For the HSW trait, there were 255 genes in the LD block on chromosome 2 and the encoded genes are shown in part of the LD block ( Figure 3). Among them, a candidate gene GmCYP78A57 encoded the Cytochromes P450 gene family, which has been shown in Arabidopsis and soybean to be associated with seed size. In this study, a SNP on chromosome 16 has a minor allelic effect, but may be associated with HSW in BLG accessions in this study.    showing the interaction of two SNPs on chromosome 2 and 16. Genotype "A" was represented as adenine base of Gm02_8896955, where "a" is guanin base of Gm02_8896955. Genotype "B" indicates the adenine base of Gm16_31822897 and "b" shows the guanine base of Gm16_31822897. Statistical analysis was conducted using the Student's t-test (*** p < 0.001). Bars indicate standard deviation. LSD is the least square difference between genotypic classes in hundred-seed weight and different letters on bars are different at a 5% level of significance. Glyma.02G119600 encoded CYP78A57 is indicated by the red arrow. Each pot represents the SNP. Red dots are the most significant SNPs for hundred seed weight.

Allelic Variation of Candidate Gene Analyzed with Publicly Available Genome Sequencing Data
GmCYP78A57 consists of two exons and one intron with the gene structure shown in Figure 4A. Soybean allele catalog, an online tool, was used to identify the allelic variation of GmCYP78A57 with publicly available genomic sequence data. In this study, we have used 952 accessions including 107 Glycine soja, 649 soybean cultivars, 196 landrace, and 146 undefined accessions ( Figure 4B). There were 11 missense mutations and one frameshift in exons 1 and 2 of GmCYP78A57, namely: A-T at physical position 11,775,156; T-C at 11,775,225; deletion of G at 11,775,251; C-T at 11,775,266; T-C at 11,775,270; T-C at 11,775,297; G-A at 11,775,425; G-A at 11,775,440; G-A at 11,775,450; C-G at 11,775,686; G-T at 11,775,753; and C-G at 11,776,568, respectively ( Figure 4B). Red lines of variants represented ones of cultivars, whereas blue lines of variants were from wild soybean. Among them, nine missense mutations were shown from the wild soybean accessions, whereas three variants were from the genome of cultivars. These results indicated that there are wider allelic variations in the wild soybean than in the cultivars. In addition, 91% of total accessions (868 out of 952), including Glycine soja, cultivars, and landrace had functional GmCYP78A57. Forty-three cultivars representing 95.6% of total cultivars (43/45 cultivars) had SNP variant at position A11,775,440G, resulting in a glycine-to-serine variant at amino acid position 110.

Phenotypic Data Set of HSW from GRIN
The influence of the possible candidate gene, GmCYP78A57 on seed size was investigated by comparing the variants of cultivars with reported GRIN seed weight available on SoyBase [49]. Of the 69 soybean cultivars and landraces, 45 accessions have been reported to HSW data, consisting of 26 soybean cultivars and 19 landraces ( Figure 5; Supplementary  Table S1). Of these, only PI416890 had a SNP variant at position G11,775,425A, resulting in an alanine-to-threonine variant at amino acid position 105, whereas the rest of them had a G110S variant. The HSW of these accessions was from 4.4 to 17.6 g, with 10.6 ± 3.2 g. Nineteen accessions were shown to be less than 10.0 g of HSW.
Agronomy 2022, 12, x. https://doi.org/10.3390/xxxxx www.mdpi.com/journal/agronomy them, nine missense mutations were shown from the wild soybean accessions, whereas three variants were from the genome of cultivars. These results indicated that there are wider allelic variations in the wild soybean than in the cultivars. In addition, 91% of total accessions (868 out of 952), including Glycine soja, cultivars, and landrace had functional GmCYP78A57. Forty-three cultivars representing 95.6% of total cultivars (43/45 cultivars) had SNP variant at position A11,775,440G, resulting in a glycine-to-serine variant at amino acid position 110.

Phenotypic Data Set of HSW from GRIN
The influence of the possible candidate gene, GmCYP78A57 on seed size was investigated by comparing the variants of cultivars with reported GRIN seed weight available on SoyBase [49]. Of the 69 soybean cultivars and landraces, 45 accessions have been reported to HSW data, consisting of 26 soybean cultivars and 19 landraces ( Figure 5; Supplementary Table S1). Of these, only PI416890 had a SNP variant at position G11,775,425A, resulting in an alanine-to-threonine variant at amino acid position 105, whereas the rest of them had a G110S variant. The HSW of these accessions was from 4.4 to 17.6 g, with 10.6 ± 3.2 g. Nineteen accessions were shown to be less than 10.0 g of HSW.

Discussion
BLG soybeans have been used as traditional ingredients in medicinal treatments in China, Japan, and Korea [44]. Studies have indicated that daily consumption of black soybeans may reduce the risk of breast cancer and cardiovascular diseases [67][68][69][70]. Due to the health benefits of BLG soybean, consumers prefer to use it to cook with rice and other side dishes in Korea. Higher HSW is an interesting trait to develop a new BLG cultivar in Korea due to consumers' preference [71]. This study investigated the phenotypic variation of HSW in BLG accessions, ranging from 9.1 to 49.3 g, with a mean value of 33.9 g ( Figure  1C). Among them, 380 accessions had more than 30.0 g in HSW. The power of GWAS to

Discussion
BLG soybeans have been used as traditional ingredients in medicinal treatments in China, Japan, and Korea [44]. Studies have indicated that daily consumption of black soybeans may reduce the risk of breast cancer and cardiovascular diseases [67][68][69][70]. Due to the health benefits of BLG soybean, consumers prefer to use it to cook with rice and Agronomy 2022, 12, 250 9 of 13 other side dishes in Korea. Higher HSW is an interesting trait to develop a new BLG cultivar in Korea due to consumers' preference [71]. This study investigated the phenotypic variation of HSW in BLG accessions, ranging from 9.1 to 49.3 g, with a mean value of 33.9 g ( Figure 1C). Among them, 380 accessions had more than 30.0 g in HSW. The power of GWAS to detect the significant SNPs associated with the trait of interest is determined by the phenotypic variance [72,73]. We assumed that BLG accessions could be valuable materials to identify the genes to increase seed size in soybeans. In addition, with BLG accessions, our previous study reported that most significant SNPs related to anthocyanin compositions were colocalized with the O locus, which corresponded with an anthocyanidin reductase gene and R locus, which is the R2R3 MYB transcription factor for upregulating UDP-glycose: flavonoid 3-O-glycosyltranferase (UF3GT) in black soybeans [55].
Seed weight is an important yield component in soybeans, with a positive correlation between HSW and yield [5,6]. Although HSW is a complex and quantitative trait, understanding its genetic basis can provide useful information to improve the potentials of soybean yield. The genetic studies for seed size have been well-reported in Arabidopsis and rice. In Arabidopsis, the CYP78A5 (KLU) gene encoded Cytochromes P450 gene family were reported to control increased flower and seed size [36]. With ortholog of CYP78A5, Zhao et al. [42] indicated that GmCYP78A genes were associated with the regulation of seed size in soybeans through reverse genetic approaches, such as overexpression and knockdown. They found that GmCYP78A72 was overexpressed in soybeans and Arabidopsis, resulting in increased seed size. However, knock-down of a single GmCYP78A72 gene did not result in decreased seed size in soybean, whereas triple variants of GmCYP78A genes reduced the soybean seed in transgenic plants [42]. In addition, Wang et al. [43] found that mutant alleles of GmCYP78A10 in wild and cultivated soybean were associated with smaller seed size. In this study, significant SNPs were identified to control the seed size of soybean on chromosome 2 through a forward genetic approach. Guanine base of the most significant SNP on chromosome 2 statistically reduced the seed size of BLG accessions. The result of this study indicated that GmCYP78A57 may be a possible candidate gene for controlling the seed size in soybean. In addition, an interaction plot between SNPs on chromosomes 2 and 16 supported that QTL on chromosome 16 showed a minor allelic effect for seed size in this study ( Figure 2). This result assumed that another gene on chromosome 16 may be playing an important role of additive effect to reduce seed size with GmCYP78A57.
The molecular mechanisms to control the seed size were well identified in Arabidopsis and rice. Several signaling pathways have been shown to control seed size [26]. In soybean, genes for soybean seed size were identified to be involved in different pathways. The complex of PPD/KIX/TPL is involved in cell proliferation, resulting in increased soybean seed size [31]. In addition, PP2C (Glyma.17G221100) associated with brassinosteroid signaling from Chinese wild soybean have been identified to control seed weight [33]. Identified loci, in this study, may be associated with CYP78A genes. BLG accessions with small-seed sizes were still shown in genotypic group "AA" of Gm02_8896955 (Figure 2A). The result of this study assumed that small-seed sizes of BLG accessions with "AA" of Gm02_8896955 were involved in different pathways to reduce seed weight. In addition, our previous study reported the analyses of population structure and PCA to reveal three clusters in 470 BLG accessions. Small-seed sizes of BLG accessions with "AA" of Gm02_8896955 were in cluster 3, whereas the ones with the "aabb" genotype belonged to cluster 2, which had relatively higher genetic diversity than clusters 1 and 3 [55]. As small-seed sizes in the "AA" genotype of Gm02_8896955 may be associated with other genes for HSW, further research like linkage analysis with a bi-parental mapping population will be required to identify genes to control seed weight with BLG accessions in "AA" genotype of Gm02_8896955.
The amount of re-sequencing data on soybean has risen to be publicly available over the last decade [49][50][51][52]. Online tools and databases have been developed with genome sequencing data. In this study, the soybean allele catalog, an online tool, was used to identify the allelic variation of GmCYP78A57 through SoyKB [65], revealing 11 missense mutations and one deletion in exons (Figure 4). Of these, 10 variants of GmCYP78A57 were shown in wild soybean accessions. We supported that wild soybean accessions showed a wide range of allelic variations.
The growth habits of soybean are classified as determinate, indeterminate, and semideterminate types [74]. Dt1 (Glyma.19G194300) is homolog of Arabidopsis terminal flower 1 (TFL1). Mutations in the dt1 gene cause the transition from indeterminate to determinate phenotype in soybeans [75]. The trait of growth habit in soybeans is a critical one that affects the flowering time, plant height, number of nodes, and maturity, thereby resulting in soybean production [75][76][77][78]. A second gene, Dt2 (Glyma.18G273600) encoded MADSdomain factor gene were identified for the stem growth habit in soybean [74]. Semideterminate were determined by the dt2dt2 genotype along with Dt1Dt1 background, whereas the Dt1Dt1Dt2Dt2 genotype in soybean had an indeterminate growth habit. In this study, a correlation analysis demonstrated that plant height was strongly positively correlated with the number of nodes (r = 0.84, p < 0.001). In addition, a most significant SNP (Gm19_45204441) was colocalized for plant height and number of nodes. In the haplotype block, a candidate gene for plant height and number of nodes may be the Dt1 gene in BLG accessions. Similarly, a GWAS with 419 diverse soybean plant introductions from 26 countries reported that the Dt1 gene showed pleotropic effects for plant height and internode number [25].
Our previous study reported that BLG collections exhibited narrow genetic variability [55]. BLG accessions may spread over a wide range of geographical areas by farmers' distribution due to better performance and yield for a long history of soybean cultivation in South Korea. In this study, we supported to show the larger size of LD blocks on significant SNPs across different chromosomes. Moreover, the LD decay rate was approximately 1200 kb (Supplementary Figure S1). Although BLG accessions showed narrow genetic diversity, in this study, the phenotypic variation of HSW in BLG accessions, ranging from 9.1 to 49.3 g, indicated a wide range of phenotypic distribution. Thus, BLG accessions were suitable materials for studying HSW in soybean.
In conclusion, GWAS with 470 BLG accessions and 6K SNPs were conducted to identify significant SNPs for HSW on chromosomes 2 and 16. This study indicated that GmCYP78A57 may be a possible candidate gene for controlling the seed size in soybeans, and the QTL on chromosome 16 showed a minor allelic effect for seed size. We assumed that another gene on chromosome 16 may play an important role of additive genetic effect to reduce seed size along with GmCYP78A57. This information can provide the development of molecular markers to use in soybean breeding programs to release a new BLG cultivar with increased or decreased seed weight and improved soybean yield.