Genetic Diversity Patterns and Discrimination of 172 Korean Soybean ( Glycine max (L.) Merrill) Varieties Based on SSR Analysis

: The soybean development goal in Korea has changed over time, but the pattern of genetic diversity in modern varieties has not yet been well characterized. In this study, 20 simple sequence repeat (SSR) markers are shown to generate a total of 344 alleles, where the number of alleles ranges from 7 to 29, with an average of 17.2 per locus, and the polymorphism informative content (PIC) values range from 0.6799 to 0.9318, with an average of 0.8675. Five di ﬀ erent clusters are classiﬁed using the unweighted pair group mean arithmetic (UPGMA) method. The genetic distance between clusters I and V (0.3382) is the farthest, and that between clusters III and IV (0.0819) is the closest. The genetic distance between all pairings of groups, according to the time period of their release, is lowest (0.1909) between varieties developed in the 1990s and those from 2000 onward, and highest (0.5731) between varieties developed in the 1980s and those from 2000 onward. Model-based structure analysis revealed the presence of three sub-populations and 17 admixtures in the Korean soybean varieties. All 172 Korean soybean varieties were tested for discrimination using six SSR markers. The numbers of varieties that were clustered in each step are as follows: 7 (4.1%) in step 1 (Sat_076), 73 (42.4%) in step 2 (Sat_417), 69 (40.1%) in step 3 (Sat_043), 13 (7.6%) in step 4 (Satt197), 8 (4.6%) in step 5 (Satt434), and 2 (1.2%) in step 6 (Satt179). These results, based on the analysis of genetic resources, can contribute to the creation of a core collection for soybean conservation and breeding, as well as to the development of future varieties with useful traits. B.S.G., and J.S.; Resources, H.-S.K.; Data curation, T.-Y.H.; Writing—original draft preparation, T.-Y.H. and B.S.G.; Writing—review and editing, J.S. and H.-S.K. Visualization, T.-Y.H.; Supervision, J.S.; Project administration, H.-S.K.; Funding acquisition, H.-S.K. All authors


Introduction
Soybean (Glycine max (L.) Merrill) has historically been an important source of protein and oil for Korean people and is widely grown in East Asia, including in Northeast China [1]. The early cultivation of soybean in Korea is estimated to have begun around the fourth to fifth century BCE, and, since the era of the Three Kingdoms, it has become common in the Korean diet. In Korea, the goal of the development of soybean varieties was to increase yield up to the 1960s and 1970s. In the 1980s, mechanization-adaptive varieties were developed, in response to the declining labor force in rural areas. In the 1990s, soybean varieties were developed with the goal of breeding for diversification of use, multiplicity, and high quality, to allow for differentiation from imported varieties. In recent

DNA Extraction and SSR Analysis
Genomic DNA was extracted using the CTAB method [14]. The extracted DNA was used for PCR analysis by first checking the DNA concentration using a spectrophotometer (Nanodrop ND-1000, Thermo Scientific, Waltham, MA, USA) and adjusting the concentration to 20 ng/µL.
A total of 20 markers were used for the genetic diversity analysis (Table 2), selected from previous studies as having one primer with high polymorphism in each linkage group [9,15]. PCR was performed in a reaction mixture of 20 µL containing 20 ng of total genomic DNA, 0.4 µM of each primer, 10 mM dNTP mixture, 10 × PCR buffer, and 1 unit Taq polymerase (Bioneer, Daejeon Korea) using a MG96G (Longgene scientific, Hangzhou, China). The amplification protocol included initial denaturation for 5 min at 94 • C; 35 cycles of denaturation for 30 s at 94 • C, annealing for 30 s at 47-57 • C, and extension for 60 s at 72 • C; and a final extension step for 10 min at 72 • C. Amplification products were resolved using electrophoresis though 4% polyacrylamide sequencing gel at 1600 V for 100 to 120 min in 0.5 × Tris-borate-ethylenediaminetetraacetic acid (TBE, Skaelskør, Denmark) buffer. The gel was stained with silver staining [16].

Selection of SSR Markers for Discrimination of the Varieties
The 172 varieties of Korean soybean were discriminated in six steps using six markers, including Sat_076, Sat_417, Sat_043, Satt197, and Satt434 with high polymorphism, and one marker, Satt179, with the lowest polymorphism. The discrimination of soybean varieties was determined by combining the markers with the highest polymorphism in the first step and the marker with the next highest polymorphism in each step. Four markers (Sat_076, Sat_417, Sat_043, and Satt197), which were highly diverse among the 20 markers [9], were initially selected. Additional markers (Satt434 and Satt179) were selected to efficiently identify the unknown varieties.

Data Analysis
The number of alleles, major allele frequency, genetic diversity, and PIC values were analyzed using PowerMarker 3.25 software [17]. Genetic distances were obtained using PowerMarker, according to the method of Nei [18], and the phylogenetic tree was prepared using the unweighted pair group mean arithmetic (UPGMA) method to compare group classification.
The software program Structures 2.3.3 [19,20] was utilized to detect possible sub-populations (K = 1 to K = 10) with a model allowing for admixture and correlated allele frequencies using a burn-in of 100,000 and run length of 100,000, followed by 5 iterations. The optimal number of populations corresponds to the highest peak in the ∆K graph [21], and the Korean soybean varieties with membership probabilities (more than 90%) were assigned to each sub-population. Table 3 shows the results of genetic diversity analysis using the 20 SSR markers for the 172 soybean varieties developed in Korea. A total of 344 alleles were detected. The number of alleles ranged from 7 (Satt164 and Satt179) to 29 (Sat_076) per locus, with an average of 17.2. The size of alleles ranged from 107 to 361 bp, with Sat_417 and Sat_190 at the widest range of 139-229 bp and 125-215 bp, respectively; whereas Sat_164 had the narrowest range of 222-246 bp (Table S2). PIC values ranged from 0.6799 to 0.9318, with Sat_076 having the highest (0.9318) and Satt179 the lowest (0.6799). The average PIC value was 0.8675.

Genetic Diversity and Polymorphism of SSR Loci
The total varieties per cluster were as follows: 4 (2.3%) in cluster I, 52 (30.2%) in cluster II, 61 (35.5%) in cluster III, 46 (26.7%) in cluster IV, and 9 (5.2%) in cluster V. More than 92% of the total varieties were found in three clusters (II, III, and VI; Figure 1, Table S1 and Table 4). Calculation of genetic distance for pairwise combinations of all soybean varieties supported this conclusion (Table 5). Although the results were obtained with only a limited number of genotypes, clusters III and IV showed the lowest genetic distance (Table 5).

Polymorphism of SSR Loci by Time Period of Release Years and Population Structure
The number of alleles and PIC values obtained as a result of our SSR analysis of the 172 Korean soybean varieties of each group, as classified by release year, are shown in Table 6. A total of 185 alleles in Korean soybean varieties were developed before 1980, where the average number of alleles was 9.3 and the average PIC value was 0.8188. In the 1980s, the total number of alleles was 167, the average number of alleles was 8.4, and the average PIC value was 0.7913. A total of 251 alleles in Korean soybean varieties were developed in the 1990s, where the average number of alleles was 12.6 and the average PIC value was 0.8416. Since 2000, the allele total was 315, the average number of alleles was 15.8, and PIC value was 0.8631. The soybean varieties developed after the 1990s and 2000s had a higher number of alleles and higher PIC values than those developed before 1980. Due to the sample size difference, the varieties developed after the 1990s and 2000s had more alleles than the varieties developed before 1980. The varieties in each cluster, as classified by release period, are shown in Table 4. We found that 18 varieties developed before 1980 were distributed in clusters II and IV, with the most varieties (15) occurring in cluster IV. The 17 varieties developed in the 1980s were in clusters II, III, and IV, with most of these (12) in cluster IV. The 43 varieties developed in the 1990s were distributed throughout all clusters, with 17 varieties found in cluster IV. The varieties developed since the 2000s were distributed in all clusters, but mostly in clusters II and III (80%).
The genetic distances between all pairings of groups by release period are shown in Table 7. The genetic distance between varieties developed after 2000 and those developed in the 1980s was the furthest (0.5731), while the genetic distance between varieties developed after 2000 and those developed in the 1990s was the closest (0.1909; Table 7). The population structure of the 172 Korean soybean varieties was inferred using the Structure V2.3.3 software, based on 20 SSR markers. For each K value (K = 1-10), we determined the optimal value of K by calculating ∆K. As we used Korean soybean varieties in this study, we surveyed the highest ∆K value over "K = 3". The three sub-populations, referred to as Pop. 1-3, were observed with the highest ∆K value ( Figure 2) and 17 varieties were admixture type, with a membership probability < 90% (Figure 3). Most soybean varieties were distinguished according to release period. The result of population structure analysis is shown in Table 8. We found that only 18 varieties developed before or in the 1980s were in Pop. 3, except Iksan and Hwangkeumkong. The varieties developed since the 2000s were distributed in all clusters, but mostly in Pops. 1 and 2 (79%).        Step

Discrimination of Soybean Varieties
In the first step, seven (4.1%) varieties were discriminated according to the results when using Sat_076, which had the highest allele and PIC values, and were not discriminated from 165 varieties. In step 2, 73 (42.4%) varieties were discriminated among the 165 varieties by including Sat_417. In step 3, 69 (40.1%) varieties were discriminated among 92 varieties by including Sat_043. In step 4, 13 (7.6%) of 23 varieties were discriminated when including Satt197. In step 5, 8 (4.6%) of 10 varieties were discriminated when including Satt434. In the final step, two varieties (1.2%), Mansu and Nampung, were discriminated by including Sat_179, allowing for discrimination of all 172 varieties. Of the 20 markers, Mansu and Nampung could only be discriminated when using Sat_179 (Table 8, Figure S2).

Discussion
Many genetic diversity analyses using DNA markers have been reported for the development of new varieties, discrimination of varieties, and to search for useful genes. Song et al. [22] reported the use of 72 markers to analyze 185 accessions of genetic resources collected from Korea, China, Japan, India, Myanmar, Philippines, and the United States, detecting 3-31 alleles per locus and an average of 10.9 alleles, which were used to classify the data into three groups, following which detailed groups were formed according to geographical origins. Kuroda et al. [23] used the genetic resources of 1318 native varieties and wild-type plants from China, Japan, Korea, and Russia to analyze the genetic diversity and develop a core collection. A high number of rare alleles were found in the wild types of Korea, suggesting that the ratio of soybean core collection to selection was high. Korea has its own wild soybean types, but other types of resources have also been introduced from both China and Japan, resulting in accessions with high genetic diversity. Wang et al. [24] analyzed 23 developed soybean and native varieties and 17 wild types using 40 SSR markers, and detected that the high genetic diversity of wild types was due to the loss of many alleles in the evolutionary process.
The genetic distance between all pairings of groups by release period shows that those in the 1990s and 2000 onward were the closest (0.1909), and 2000 onward and before or during the 1980s were the farthest (0.5587 and 0.5731, respectively; Table 7). The PIC value, in terms of release periods among Korean soybean varieties, was the highest in those developed from 2000 onward ( Table 6). The reason for this finding is that, since the mid-1980s, various varieties have been developed with the aim of diversification of use and high quality. Many varieties have been developed for comparison to other groups by release period. We also found that the most genetic diversity among varieties can be observed in those developed since the 2000s (Figure 1 and Figure S1, Table 4). This trend is expected to continue in the future.
Many results of genetic diversity analyses and discrimination of varietal studies using DNA markers have been reported. Kim et al. [25] discriminated 82 of the 91 developed soybean varieties from 1913 to 2002 using five SSR markers-Sat_043, Sat_022, Sat_036, Sat_088, and Satt045-and reported that unidentified varieties could not be distinguished using their morphological characteristics. Hwang et al. [26] reported that 18 sequence-tagged sites-cleaved amplification polymorphic sequence (STS-CAPS) markers (51 combinations)-were developed from base sequence information in prepared genomic DNA libraries, and 106 Korean soybean varieties were discriminated in 14 steps. Gao et al. [27] discriminated 83 soybean varieties using nine markers with high polymorphism and high diversity within the population. An allele of each marker was coded to form a nine-digit identification (ID) for the discrimination of soybean varieties.
In Korea, the first developed soybean variety was Jangdanbaekmok produced by separating landrace through pure line isolation in 1913, and the first hybrid variety was developed in 1969 characterized with high yielding ability and resistant to mosaic virus [28]. Until 2014, a total of 178 soybean varieties have been developed and registered at the two national institutes, the RDA-Genebank Information Center and the Korea Seed & Variety Service [28,29]. Lee et al. [29] analysis that a total of four pedigrees involving 168 varieties (94.4% out of 178 varieties), which form the broadest network of pedigrees. We analyzed genetic diversity using 172 Korean soybean varieties developed until 2013. Of these, 162 varieties include in four pedigrees, the results of genetic diversity and pedigrees provide information for selection of parental lines and design of crossing strategies.
The full-length sequence of the American soybean variety Willams 82 has been analyzed and published for the first time [30] and, using the sequence information, a large amount of information regarding candidate polymorphic SSR markers has also been published [31]. In Korea, Kim et al. [32] analyzed the full-length sequences of six Korean soybean varieties to distinguish genomes from dense variation blocks (dVBs) with SNP accumulation and sparse variation blocks with little SNPs. As dVBs have distinctive characteristics according to variety, various differences were indeed found to exist among the varieties. Therefore, 202 dVB-specific insertion/deletion (indel) markers have been developed [33]. A barcode system was established, which clearly distinguishes 147 Korean soybean varieties and is regularly updated whenever new varieties are developed [34,35]. The development of DNA markers has been simplified due to the active progression of genome research, but the development of DNA markers through full-length genome analysis in a small-scale laboratory is still difficult. As the marker analysis was not performed directly, polymorphism information is not known in full-length genome analysis. In this study, we analyzed 172 Korean soybean varieties developed from 1913 to 2013 using SSR markers that have been previously confirmed to be highly polymorphic. The analysis of genetic resources, including newly developed soybean varieties after 2014, in addition to the data accumulated in this study, will contribute to the development of future varieties with useful traits, such as drought-and disease-resistance, including the determination of species and the creation of a core collection for soybean conservation and breeding.

Conclusions
In this study, the genetic diversity of 172 Korean soybean varieties was analyzed and discriminated using SSR Markers. Five different clusters were classified using the UPGMA method, among which the genetic distance between clusters I and V was the farthest, and that between clusters III and IV was the closest. The genetic diversity distance by release period between all pairings of groups was the lowest between varieties developed after 2000 and the 1990s, and the highest between varieties developed after 2000 and the 1980s. The results demonstrate that the greatest genetic diversity among varieties was found in those that had been developed after 2000. The 172 varieties of Korean soybean were discriminated in six steps using six markers (Sat_076, Sat_417, Sat_043, Satt197, Satt434, and Satt179). The analysis of genetic resources, including newly developed varieties, in addition to the data accumulated in this study, will contribute to the creation of a core collection for soybean conservation and breeding, as well as to the development of future varieties with useful traits.
Supplementary Materials: The following are available online at http://www.mdpi.com/2077-0472/10/3/77/s1, Table S1: Number of soybean varieties classified by released year and breeding organization for this experiment, Table S2: Information of band size (bp) based on 20 markers for SSR analysis in 172 Korean soybean varieties, Figure S1: Principal coordinate analysis (PCoA) plot of 172 Korean soybean varieties using 20 SSR markers, Figure S2: Discrimination of varieties at each step by UPGMA dendrogram using six markers in 172 Korean soybean varieties.