Genetic Diversity of Soybeans (Glycine max (L.) Merr.) with Black Seed Coats and Green Cotyledons in Korean Germplasm

Soybeans (Glycine max (L.) Merr.) with black seed coats and green cotyledons are rich in anthocyanins and chlorophylls known as functional nutrients, antioxidants and compounds with anticarcinogenic properties. Understanding the genetic diversity of germplasm is important to determine effective strategies for improving the economic traits of these soybeans. We aimed to analyze the genetic diversity of 470 soybean accessions by 6K single nucleotide polymorphic loci to determine genetic architecture of the soybeans with black seed coats and green cotyledons. We found soybeans with black seed coats and green cotyledons showed narrow genetic variability in South Korea. The genotypic frequency of the d1d2 and psbM variants for green cotyledon indicated that soybean collections from Korea were intermingled with soybean accessions from Japan and China. Regarding the chlorophyll content, the nuclear gene variant pair d1d2 produced significantly higher chlorophyll a content than that of chloroplast genome psbM variants. Among the soybean accessions in this study, flower color plays an important role in the anthocyanin composition of seed coats. We provide 36 accessions as a core collection representing 99.5% of the genetic diversity from the total accessions used in this study to show potential as useful breeding materials for cultivars with black seed coats and green cotyledons.


Introduction
The composition of soybean (Glycine max [L.] Merr.) seeds is 40% protein, 20% oil, and 15% soluble carbohydrates, making it one of the most economically important crops in the world. Soybean production in western countries primarily focuses on producing high-protein meals for livestock and vegetable oils, whereas in Asian countries soybeans have traditionally been used as a staple food and consumed as soymilk, tofu, soy sprouts, fermented soy foods, and soy sauce [1,2]. Soybeans with black seed coats have been attracting interest as a soybean food [3]. Soybeans with black seed coats can be classified into two groups based on their cotyledon colors, which are either green or yellow. Soybean with black seed coats and green cotyledons (BLG) have been used as traditional ingredients in medicinal treatments in China, Japan, and Korea, unlike yellow commodity soybeans [4]. Several studies have reported that daily consumption of black soybeans is associated with a reduced risk of breast cancer and cardiovascular diseases due to their content of potentially active phytochemicals, such as isoflavones, sterols, phytic acid, saponins, and anthocyanins [5][6][7][8]. In addition, BLG soybeans are preferred by consumers for health benefits, and are often cooked with rice and other side dishes in Korea. the size, cost, and labor issues during evaluating an entire population by focusing on a limited but representative subset instead. Therefore, core subsets have been constructed for various crop species based on morphological and phenotypic observations [39][40][41][42][43][44][45][46][47]. Because quantitative phenotypic traits are often affected by environmental factors, a core subset that covers the genetic diversity of the entire population cannot be perfectly constructed by phenotype [48]. Molecular markers directly reflect genetic diversity rather than phenotypic assessments [49].
Due to increasing consumer awareness regarding the BLG soybean, it has become a preferred food ingredient soybean in South Korea. However, little information on the genetic diversity of BLG accessions has been obtained thus far. Korea is one of the centers of origin for domesticated soybeans, with a long history of cultivation [50]. There arẽ 17,000 improved and landrace cultivars (Glycine max) in the National Agrobiodiversity Center of Rural Development Administration in Korea [51]. Among them, we used 405 randomly selected collections in this study. The objective of this study was to analyze the genetic diversity of BLG accessions by 6K single nucleotide polymorphism (SNP) loci to understand their genetic architecture. In addition, a core subset of accessions was selected to represent the BLG accessions.

Growth Conditions of BLG Germplasms
To understand the genetic diversity, a collection of 405 BLG accessions was distributed from the National Agrobiodiversity Center in Jeonju, Republic of Korea. The collection comprised 385 landraces, eight breeding lines, five unknown, and seven plant inductions from the U.S. National Genetic Resources Program (Table S1). Among the 405 BLG accessions, 397 were collected from Korea, two from Japan, one from China, one from the U.S., and four were unknown. However, 47 accessions from the National Agrobiodiversity Center showed different seed characteristics, plant appearance, flowering and maturity. Therefore, we conducted a pure line selection and added 47 accessions as independent individuals (Table S2). Fifteen additional BLG accessions were locally collected from Gyeongsanbuk-do, Republic of Korea. Two BLG cultivars with green cotyledons, Cheongja [52], Cheongja 3 [53], and soybean cultivar with the yellow seed coat and yellow cotyledon, Uram [53], were used as check cultivars for this study. The 470 accessions including three check cultivars formed the total population and were used for further analyses. The entire set of 470 accessions was grown at Gyeongsanbuk-do Agricultural Research and Extension, Daegu, Republic of Korea over three years and planting dates (14 June 2013, 29 May 2014, and 15 June 2015. Each soybean accession was planted in single rows that were 1.5 m long and spaced 80 cm apart, with two replications. Seeds were planted by hand on hills in rows spaced 15 cm apart and thinned to a final stand of two seedlings per hill. Each plot grown from each year was harvested in bulk at the plant's full maturity (R8 stage) [54] for further seed analysis.

DNA Extraction and Determination of Genotyping for Soybean Accessions
Young trifoliate leaves were collected from soybean accessions with three check cultivars in the summer of 2015. Before DNA extraction, the leaves of each line were frozen in liquid nitrogen and ground into a fine powder. Next, 20 mg of leaf tissue from each sample was placed into tubes, and each DNA sample was isolated using the cetyltrimethylammonium bromide method with a minor modification [55]. Quantification and qualification of the genomic DNA of each accession was determined by electrophoresis running on 1.5% agarose gel. Next, 30 µL of genomic DNA at 100 ng/µL from 470 accessions, including three check cultivars, Cheongja, Cheongja 3, and Uram, were sent to the National Instrumentation Center for Environmental Management (NICEM; Seoul, Korea) at Seoul National University to genotype the soybean accessions using BARCsoySNP6K BeadChip [56]. The NICEM staff performed the assay procedures encompassing a series of approaches, such as incubation, DNA amplification, preparation of the bead assay, hybridization of samples of the bead assay, extension, staining of the samples, and imaging of the bead assay [57]. The SNP alleles were called using the Genome Studio Genotyping Module (Illumina, Inc. San Diego, CA, USA) [57]. Total 4459 SNPs were used for further analyses after filtering through the TASSEL software to exclude those with >20% missing data and rare SNPs.

Basic Population Genetic Parameters, Population Structure, and Construction of a Core Subset Accession
Minor allele frequency, genetic diversity index, polymorphism information content (PIC), and heterozygosity were evaluated using 469 BLG accessions with Uram and 4459 SNP markers using PowerMarker 3.25 software [58]. Principal component analysis (PCA) was conducted with 470 soybean accessions using the R package SNPRelate tool. This plot showed the first principal component (PC1) against the second principal component (PC2). For the phylogenetic tree in the present study, an unweighted pair group method with arithmetic mean (UPGMA) tree was constructed with entire accessions using the calculation of the distance based on a modified Euclidean distance between each pair of accessions by TASSEL [59]. The admixture model was used to analyze the genotypic data using STRUCTURE software [60], which is one of the most widely used genotypic clustering software. Three runs of STRUCTURE were executed for each number of the population (K) from 1 to 10. The burn-in time and replication number were set to 100,000 in each run. For the construction of a core collection, the 4,459 SNP genotypic data of the total BLG accessions was analyzed using GenoCore [61]. In this study, 99.5% of the coverage and 0.001% of the delta value were applied to select accessions for a core subset representing the entire collection.

Genotyping Assays for D1, D2, and PsbM
To genotype D1 for the presence of the 1-bp deletion [29], a polymerase chain reaction (PCR) was conducted using a forward primer 5 -CGTTGTTGGGTTTGTCTGATGG-3 , and two reverse primers, 5 -GCGGGCTCGTCCACTCCTAAGAATAAAACC-3 and 5 -GCGGGCAGGGCGGCTCGTCCACTCCTAAGAATAAACC-3 . The total volume of the PCR reaction was 20 µL, containing 1× real-time PCR smart mixtures (2.5 U/µL h-Taq DNA polymerase, 1× h-Taq reaction buffer, and 200 µM dNTP) with EvaGreen (SolGent Co. Ltd. Daejeon, Korea), 0.5 µM primers, and 5-50 ng of genomic DNA template. The PCR conditions for the D1 assay were used for the Gene Touch PCR thermal cycler (Hangzhou Bioer Technology Co. Ltd., Hangzou, China) as follows: 95 • C for 15 min, followed by 35 cycles of 95 • C for 20 s, 60 • C for 20 s, and 72 • C for 20 s. Melting curve analysis was conducted using a Roche LightCycler 480 II (Roche Applied Sciences, Indianapolis, IN, USA) by increasing the temperature from 65 • C to 90 • C and reading every 0.1 • C. The homozygous wild type allele of the D1 gene was detected with a peak at 86 • C, and the homozygous mutant allele containing the D1 gene showed a peak at 87 • C.

Phenotype Determination by High Performance Liquid Chromatography (HPLC)
The anthocyanin and chlorophyll contents in the BLG accessions were determined using Thermo Scientific Dionex UltiMate 3000 HPLC (Thermo Scientific Dionex, Waltham, MA, USA). To estimate the genetic and environmental variations, the anthocyanin and chlorophyll contents were measured for seeds from each replicated plot during the three years. For anthocyanin, a hand-peeled seed coat (0.1 g) from each accession, including three check cultivars, were ground into a powder. Next, 100 mg of powder was extracted with 10 mL of 1% HCl and 20% MeOH for 24 h at 4 • C in a shaking incubator at 110 rpm/min in darkness. The extracted solutions were filtered through Whatman No. 2 filter paper and a syringe filter (0.2 µm). Six extracted anthocyanins (delphinidin 3-O-glucoside, cyanidin 3-O-glucoside, petunidin 3-O-glucoside, pelargonidin 3-O-glucoside, peonidin 3-O-glucoside, and malvidin 3-O-glucosie) were separated in a YMC-pack Pro C18 RS analytical column (250 mm × 4.6 mm, 5 µm). Next, 10 µL of extracted anthocyanins was injected into the column at 1.0 mL/min rates at a temperature of 30 • C. The mobile phases were composed of H 2 O/HCOOH (90/10, v/v) (mobile phase A) and CH 3 CN/CH 3 OH/H 2 O/HCOOH (22.5/22.5/40/10, v/v) (mobile phase B). The gradient conditions were as follows: 0 min 7% B, 35 min 35% B, 45 min 65% B, 46 min 100% B. Each anthocyanin was detected using a VIS detector at 520 nm and was quantified based on the standard curves generated for each anthocyanin. The anthocyanin content was converted to milligram per gram (mg/g).
For the chlorophyll content, green cotyledons without a seed coat from each accession, including three check cultivars, were ground into powder. One gram of powder was extracted with 10 mL of 85% (CH 3 ) 2 CO for three hours at 40 • C in a shaking incubator at 110 rpm/min in darkness. The extracted solutions were filtered through Whatman No. 2 filter paper and a syringe filter (0.2 µm). Two extracted chlorophylls (Chl a, and Chl b) were separated in a YMC-pack ODS-A analytical column (150 mm × 6.0 mm, 5 µm). Next, 20 µL of extracted chlorophylls was injected into the column with 1.0 mL/min rates at a temperature of 30 • C. The mobile phases were composed of 75% MeOH in water (mobile phase C) and 100% EtOAc (mobile phase D). The gradient conditions were as follows: 0 min, 30% D; 15 min, 90% D; 20 min, 30% D. Each chlorophyll was detected using a UV-vis detector at 430 nm and was quantified based on the standard curves generated for each chlorophyll content. Each chlorophyll content was converted to microgram per gram (µg/g).

Genome Wide Association Study (GWAS) for Anthocyanin
GWAS was conducted using TASSEL software and the GAPIT R package. PCA was constructed with 4459 SNPs [62]. The kinship coefficient matrix was used to provide an estimate of additive genetic variance [62,63]. In the present study, we used models wherein a mixed linear model produced p values to populate Manhattan plots [63,64]. The significance of associations between SNPs and traits was based on Bonferroni's correction and false discovery rate analyses.

Statistical Data Analysis
All statistical analyses in this study were conducted in SAS v9.4 (SAS Institute, Cary, NC, USA, 2013). Analysis of variance (ANOVA) was conducted to evaluate differences over the three years using PROC GLM in SAS. A comparison of the measured chlorophyll and anthocyanin between the two groups was determined using genotyping, and a Student's t-test analysis (p ≤ 0.05) was conducted using PROC TTEST in SAS.

Genetic Diversity and Population Structure
To estimate the genetic diversity index of BLG accessions for Korean germplasm, 6K SNP markers were used. The genetic diversity index per SNP marker varied from 0.02 to 0.59, with an average of 0.26 ( Figure 1A). The PIC value revealed allelic diversity and frequency of the SNP markers ( Figure 1B). PIC values of 0.512 and 0.017 were revealed as the maximum and minimum, respectively, with an average of 0.220. The SNP (Gm11_7661182_T_C) at 7,661,182 on chromosome 11 was the marker that showed the most diversity in this study based on the genetic diversity index and the PIC value. Minor allele frequencies ranged from 0.01 to 0.50, with an average of 0.18 ( Figure 1C).
Heterozygotes were rare for most of the analyzed SNP markers, and 15 SNP markers showed >0.5 heterozygote frequency with the BLG accessions ( Figure 1D). and the genetic diversity of entire BLG accessions ( Figures 2B,C). Most accessions in cluster 1 showed no genetic differences based on PCA ( Figure 2B) and the phylogenetic tree ( Figure 2C). The second highest value of ΔK was at K = 3 as the final number of subpopulations in the present study. Therefore, these soybean germplasms could be divided into three clusters, which were colored gray for cluster 1, orange for cluster 2, and blue for cluster 3 (Figure 2A-C). PCA and phylogenetic tree analyses showed good agreement with K = 3 from the STRUTURE result ( Figures 2B,C). Cluster 1 comprised 231 accessions, whereas the other two clusters comprised 107 in cluster 2 and 132 accessions in cluster 3 (Table S1). Two BLG cultivars, Cheongja and Cheongja 3, and the yellow soybean cultivar, Uram, as a check, belonged to cluster 2 ( Figure 2B,C).  Population clustering was performed using STRUCTURE software to identify a possible population structure without introducing any prior information. The number of subpopulations could not be identified from the plot of Ln (likelihood probability) for K. However, the ∆K value identified the choice of K = 2 as the highest structural level. Admixture plots of K from 2 to 5 are shown in Figure 2A. Both PCA and UPGMA phylogenetic tree were analyzed to validate the number of clusters from the STRUCTURE result and the genetic diversity of entire BLG accessions ( Figure 2B,C). Most accessions in cluster 1 showed no genetic differences based on PCA ( Figure 2B) and the phylogenetic tree ( Figure 2C). The second highest value of ∆K was at K = 3 as the final number of subpopulations in the present study. Therefore, these soybean germplasms could be divided into three clusters, which were colored gray for cluster 1, orange for cluster 2, and blue for cluster 3 (Figure 2A-C). PCA and phylogenetic tree analyses showed good agreement with K = 3 from the STRUTURE result ( Figure 2B,C). Cluster 1 comprised 231 accessions, whereas the other two clusters comprised 107 in cluster 2 and 132 accessions in cluster 3 (Table S1). Two BLG cultivars, Cheongja and Cheongja 3, and the yellow soybean cultivar, Uram, as a check, belonged to cluster 2 ( Figure 2B,C).

Construction of Core Collection
To construct the core collection, 4459 SNPs from the 467 BLG accessions were used. The number of the core collection included thirty-six accessions accounting for 7.7% of the total population (Table S1). The core collection explained 99.5% of the genetic diversity of the total population. Among the core subset, 32 accessions belonged to cluster 2, accounting for 89.0% of the core collections ( Figure 3A). PCA was performed to reveal the core collection representing the genetic diversity of the total population. The result showed that the selected accessions in the core collection by GenoCore evenly covered the total populations of PC1 and PC2 ( Figure 3B).

Construction of Core Collection
To construct the core collection, 4459 SNPs from the 467 BLG accessions were used. The number of the core collection included thirty-six accessions accounting for 7.7% of the total population (Table S1). The core collection explained 99.5% of the genetic diversity of the total population. Among the core subset, 32 accessions belonged to cluster 2, accounting for 89.0% of the core collections ( Figure 3A). PCA was performed to reveal the core collection representing the genetic diversity of the total population. The result showed that the selected accessions in the core collection by GenoCore evenly covered the total populations of PC1 and PC2 ( Figure 3B).

Frequency of the D1, D2, and PsbM Alleles and Their Relationship with the Chlorophyll Content in Cotyledon
Genotyping of D1, D2, and PsbM for the total accessions was conducted to determine the frequency of genes in the Korean soybean germplasm. The result revealed that the 467 BLG accessions and the two check BLG cultivars contained either the double recessive mutant alleles of the D1 and D2 nuclear genes or the 5-bp insertion in the chloroplast gene, PsbM. However, accession BLG397 (IT263333) and BLG467 (IT263849) had heterozygous D1 and D2, whereas BLG397 contained a variant of the PsbM gene (Table S1)

Frequency of the D1, D2, and PsbM Alleles and Their Relationship with the Chlorophyll Content in Cotyledon
Genotyping of D1, D2, and PsbM for the total accessions was conducted to determine the frequency of genes in the Korean soybean germplasm. The result revealed that the 467 BLG accessions and the two check BLG cultivars contained either the double recessive mutant alleles of the D1 and D2 nuclear genes or the 5-bp insertion in the chloroplast gene, PsbM. However, accession BLG397 (IT263333) and BLG467 (IT263849) had heterozygous D1 and D2, whereas BLG397 contained a variant of the PsbM gene (Table S1). Seventy-six percent of the analyzed total accessions carried psbM, whereas 24% contained double mutant alleles of D1 and D2 ( Figure 4A). The genotype frequency of d1d2

Frequency of the D1, D2, and PsbM Alleles and Their Relationship with the Chlorophyll Content in Cotyledon
Genotyping of D1, D2, and PsbM for the total accessions was conducted to determine the frequency of genes in the Korean soybean germplasm. The result revealed that the 467 BLG accessions and the two check BLG cultivars contained either the double recessive mutant alleles of the D1 and D2 nuclear genes or the 5-bp insertion in the chloroplast gene, PsbM. However, accession BLG397 (IT263333) and BLG467 (IT263849) had heterozygous D1 and D2, whereas BLG397 contained a variant of the PsbM gene (Table S1). Seventy-six percent of the analyzed total accessions carried psbM, whereas 24% contained double mutant alleles of D1 and D2 ( Figure 4A    To identify the relationship of the cloned genes for stay-green and seed phenotype, the seed chlorophyll content was measured for the total BLG accessions using HPLC. The Chl a, Chl b, and total chlorophyll content of the total collection ranged from 11.5 µg/g to 88.4 µg/g with the mean of 33.9 µg/g, from 7.7 µg/g to 40.8 µg/g with the mean of 21.9 µg/g, and from 22.6 µg/g to 120.0 µg/g with the mean of 55.8 µg, respectively ( Figure S1). The chlorophyll a/b ratio ranged from 0.8 to 5.3, with the mean of 1.8. Mean values of the measured chlorophyll content of individual accessions in two stay-green genotypic groups, d1d2 and psbM, are shown in Figure 5. The Chl a and total chlorophyll contents in the d1d2 genotypic group were significantly higher than those of the psbM genotypic group, whereas Chl b in the d1d2 genotypic group was statistically lower than that of the psbM genotypes ( Figure 5A). The chlorophyll a/b ratio in the d1d2 genotypes was 4-fold higher than that of the psbM genotypic group in the total population ( Figure 5C). Each measured chlorophyll content was significant for the genotype, environment, and genotype by environment interaction during the three years (Table S3). To identify the relationship of the cloned genes for stay-green and seed phenotype, the seed chlorophyll content was measured for the total BLG accessions using HPLC. The Chl a, Chl b, and total chlorophyll content of the total collection ranged from 11.5 µg/g to 88.4 µg/g with the mean of 33.9 µg/g, from 7.7 µg/g to 40.8 µg/g with the mean of 21.9 µg/g, and from 22.6 µg/g to 120.0 µg/g with the mean of 55.8 µg, respectively ( Figure S1). The chlorophyll a/b ratio ranged from 0.8 to 5.3, with the mean of 1.8. Mean values of the measured chlorophyll content of individual accessions in two stay-green genotypic groups, d1d2 and psbM, are shown in Figure 5. The Chl a and total chlorophyll contents in the d1d2 genotypic group were significantly higher than those of the psbM genotypic group, whereas Chl b in the d1d2 genotypic group was statistically lower than that of the psbM genotypes ( Figure 5A). The chlorophyll a/b ratio in the d1d2 genotypes was ~4-fold higher than that of the psbM genotypic group in the total population ( Figure 5C). Each measured chlorophyll content was significant for the genotype, environment, and genotype by environment interaction during the three years (Table S3).
The seed coat color (I locus), pubescence color (T locus), and flower color (W1 locus) were involved in the anthocyanin pathway ( Figure 6A). All accessions in this study, except Uram, were black seed coat (ii) and tawny pubescence color (TT). Regarding the flower color, 2.6% of the total accessions (12/469 accessions) except the yellow soybean check showed a white flower (w1w1). For delphinidin 3-O-glucoside, petunidin 3-O-glucodsied, and malvidin 3-O-glucoside, ones with the w1w1 allele (white flower) were significantly lower than ones with the W1W1 allele (purple flower) ( Figure 6B). There was no significant difference between the white and purple flower regarding cyanidin 3-O-glucoside, peonidin 3-O-glucoside, and pelargonidin 3-O-glucoside. The total measured anthocyanins content was 13.1 mg/g and 8.1 mg/g for the purple and white flower, respectively. To identify the genes controlling the anthocyanin content, GWAS was performed with 4459 SNPs ( Figure 6C). For cyanidin 3-O-glucoside and the total anthocyanin content, the most significant SNPs were colocalized at 4,873,149 (Wm82.a2.v1 assembly) on chromosome 8. This SNP was near to the O locus, which corresponded with an anthocyanidin reductase gene. For delphinidin 3-O-glucoside and pelargonidin 3-O-glucoside, SNPs were located on chromosome 9, which were near the R locus. Agronomy 2021, 11, x FOR PEER REVIEW 11 of 17

Discussion
Molecular markers can directly reflect genetic diversity, in contrast to phenotypic assessments, because quantitative traits are often influenced by environmental factors [65]. The average genetic diversity index and PIC value of SNPs for the total BLG accessions were 0.26 and 0.22, respectively ( Figure 1). However, Liu et al. [36] reported a genetic diversity study on soybean cultivars and advanced breeding lines from the U.S. and China, for which the genetic diversity index and PIC values were 0.3489 and 0.2769, respectively. Hao et al. [66] primarily used Chinese soybean landraces for a genetic diversity study, which resulted in 0.391 for the genetic diversity index and 0.313 for the PIC value, respectively. In addition, Lee et al. [34] and Yoon et al. [67] reported similar genetic diversity indexes for Korean soybean accessions, with an average of 0.70 and 0.71, respectively. The genetic diversity of a population is affected by the number of markers, diversity index of markers, number of accessions, and geographical distribution of accessions used in study. Compared with previously reported results for genetic diversity studies of soybean, the level of genetic diversity of the BLG collections was relatively low because we used only soybean accessions having the BLG phenotype, which were intensively collected from Korea [34,36,[66][67][68][69]. The population structure analysis used in the present study revealed that there were three clusters in the 469 BLG accessions (Figure 2A). PCA and the UPGMA-phylogenetic tree showed that cluster 1 and cluster 3 had relatively lower genetic diversity than cluster 2 ( Figure 2B,C). The result revealed that BLG collections in cluster 1 and cluster 3 may spread over a wide range of geographical areas by farmer's distribution due to a better performance and yield for a long history of soybean cultivation in South Korea. Those accessions have been collected from different areas by the National Agrobiodiversity Center of Rural Development Administration in Korea [51]. Considered together, this may be one of the primary reasons that the BLG collections exhibited narrow genetic variability in this study.
Studies have shown that 5%-20% of the total population was generally constructed as a core collection, thereby representing 99.0% of the genetic diversity of the total population [65,70]. The size of the core collections could be determined by the genetic diversity of the total population [65]. In this study with 4,459 SNPs, 36 accessions were selected as the core collection to improve the utilization of useful breeding material for BLG phenotypes. Although the number of the core collection accounted for 7.7% of the total population, the core collection represented 99.5% of the genetic diversity of the entire population ( Figure 3). The core collection was evenly distributed for the total population. Therefore, 36 accessions can be used as breeding materials for the BLG trait in the soybean breeding program in South Korea.
In this study, we found that all 469 BLG accessions contained the same mutation of the d1d2 or psbM allele (Figure 4). Kohzuma et al. [31] revealed that most of the green cotyledon soybean strains from Japan carried psbM (99.5%), whereas all Chinese green cotyledon soybeans consisted of d1d2 (100%). However, the Korean accessions showed both d1d2 (24.1%) and psbM (75.9%) ( Figure 4A). In addition, we determined the frequencies of d1d2 and psbM in three clusters ( Figure 4C-E), which were different by three clusters: 100.0% of cluster l, 80.0% of cluster 2, and 32.0% of cluster 3 had the psbM allele ( Figure 4). Cluster 3 was divided into two subgroups based on the PCA analysis ( Figure 2B). Note that all accessions in the light blue of cluster 3 contained the d1d2 genotype, whereas the ones in the dark blue of cluster 3 had green cotyledons due to the psbM allele ( Figure 2B,C). Several studies mentioned that Korean soybeans were closely related to Japanese soybeans and were genetically distinct from the Chinese soybean population [34,35,71]. The genotypic frequency of the d1d2 and psbM alleles in this study supported that soybean accessions from Korea were a mixture of genetic contexts related to soybean from both Japan and China [72].
Because chlorophyll has anticarcinogenic properties [73,74], increasing the chlorophyll content is one of the targeted traits in breeding programs for soybean-based foods. The compositions of chlorophyll in green cotyledon were different based on the d1d2 or psbM genotypes. The amount of Chl a in d1d2 was~2-fold higher than Chl a in the psbM genotype ( Figure 5). The Chl a content was 4-fold higher than the Chl b content in the d1d2 genotypes, which resulted in a higher chlorophyll a/b ratio than that of the psbM genotype. Our result supported that d1d2 genotypes contain higher chlorophyll content [75]. In the soybean breeding program, d1d2 genotypes may be a good genetic resource for increasing the total chlorophyll contents in cotyledon seeds.
The soybean seed coat is controlled by multiple loci, such as I, T, W1, R, and O loci involved in the anthocyanin biosynthetic pathway [76][77][78]. The I locus encodes chalcone synthase, which is the key enzyme of the flavonoid pathway ( Figure 6A) [79]. A black seed coat forms primarily due to the accumulation of anthocyanins and is controlled by the i allele. In most soybean with black seed coats, the three major anthocyanins are cyanidin 3-O-glucoside, delphinidin 3-O-glucoside, and petunidin 3-O-glucoside [17]. Lee et al. [16] reported six additional anthocyanin compounds from black seed coats, including a major amount of pelargonidin 3-O-glucosied and a minor amount of peonidin 3-Oglucoside. In the present study, we analyzed six anthocyanins and found that cyanidin 3-Oglucoside, delphinidin 3-O-glucoside, and petunidin 3-O-glucoside were major components of anthocyanin compositions in total BLG accessions ( Figure 6B). The W1 locus, flavonoid 3 ,5 -hydroxylase (F3 5 H), plays an important role in the accumulation of delphinidin 3-O-glucoside, petunidin 3-O-glucoside, and malvidin 3-O-glucoside under the genetic background of the iiTT genotype [80]. Recently, Kim et al. [81] reported that the iiRRTTw1w1 genotype would prohibit the accumulation of delphinidin 3-O-glucoside and petunidin 3-O-glucoside compounds, whereas the T locus, flavonoid 3 -hydroxylase (F3 H), can control the production of cyanidin-derived and delphinidin-derived anthocyanin compounds. The result of this study was that the purple flower (W1W1) produced~5.0 mg/g more total anthocyanin ( Figure 6B). Flower color can be used as a possible selection trait to increase the total anthocyanin composition in the black soybean breeding program.
Although 97.4% of the total BLG accessions showed a black seed coat (ii allele), tawny pubescence (TT allele), and purple flower (W1W1 allele) in the present study, each anthocyanin composition still showed phenotypic variations. The GWAS result showed that SNPs on chromosomes 8 and 9 were associated with the production of delphinidin 3-O-glucoside, pelargonidin 3-O-glucoside, cyanidin 3-O-glucoside, and total anthocyanins. These SNPs were near the R locus and O locus on chromosomes 8 and 9, respectively ( Figure 6C). The R locus is the R2R3 MYB transcription factor for upregulating UDPglycose: flavonoid 3-O-glycosyltranferase (UF3GT) in black soybeans [76]. For the O locus, the expression level of anthocyanidin reductase (ANR) in the soybean genome is associated with the red-brown and black color of the soybean seed coat [82]. The GWAS result concluded that the extent of the expression levels of UF3GT and ANR may be associated with the amount of anthocyanin in BLG soybean.

Conclusions
In conclusion, BLG soybeans exhibited narrow genetic variability due to artificial selection, and a core collection representing the genetic diversity of the total BLG soybeans was constructed based on the 6K SNP genotypes. This core collection can provide useful genetic resources for the development of new cultivars for BLG seeds with increased anthocyanin and chlorophyll contents as part of a breeding program for soybean food.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-439 5/11/3/581/s1, Figure S1: Frequency distribution of an average of two chlorophyll content and sum of total chlorophyll contents in 469 accessions over three years, Figure S2: Frequency distribution of an average of two six anthocyanin content and sum of total anthocyanin contents in 469 accessions over three year, Table S1: List of 470 soybean accessions, subpopulation, core collection and genotype of D1, D2 and PsbM allele, Table S2: List of pure line selection from 47 accessions, Table S3: Mean squares from analysis of variances of each measured chlorophyll for soybean accessions over 3 years, Table S4: Mean squares from analysis of variances of each measured anthocyanin for soybean accessions over 3 years.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The datasets generated during this study are available from the corresponding author on reasonable request.