The Evaluation of Agronomic Traits of Wild Soybean Accessions ( Glycine soja Sieb. and Zucc.) in Heilongjiang Province, China

: Wild soybean germplasm is distributed widely in China, particularly in Heilongjiang Province. In this study, 242 wild soybean accessions from four agricultural divisions in Heilongjiang Province were evaluated based on six qualitative and eight quantitative traits. Results showed that a large amount of variation occurred in these evaluated traits. Abundant qualitative traits included the wild type (78.51%), purple ﬂowers (90.50%), needle leaf (39.26%), black seed (83.88%), brown hilum (52.07%), and mud ﬁlm (87.60). Results of multivariate analysis based on quantitative traits showed that 100-seed weight, seeds weight per plant, number of seeds per plant, number of effective pods, and number of invalid pods were signiﬁcantly different among samples ( p < 0.05). A total of 27 germplasms were screened. Cluster analysis identiﬁed the 242 accessions into two groups, not following the geographical distribution pattern, with rich wild soybean resources revealed in the northern site. The present study indicated that wild soybean in Heilongjiang Province should be conserved in situ. The rich genetic diversity revealed in soybeans of different sites in Heilongjiang Province suggested its signiﬁcant potential utilization in genetic improvement and breakthrough for soybean breeding. This information will help to exploit and conserve wild soybean accessions in Heilongjiang Province.


Introduction
Cultivated soybean (Glycine max (L.) Merr.) is one of the economically important crop plants worldwide, as food for both human and livestock [1]. Additionally, cultivated soybean is the main resource for the world s protein meal and produces more than half of oil seed globally [2]. Although the soybean yield per acre has increased in the past century, the production of soybean cannot meet the demand of the growing world population. Various types of technology have been developed to breed soybean lines with high yield [3,4]. However, as one of the factors affecting high yield of soybean, the narrow genetic base (i.e., bottleneck of genetic diversity) has hindered soybean improvement [5,6], significantly limiting the development of soybean lines with high yield, high resistance traits, and environmental stress tolerance. Thus, there is an urgent need to explore rich resources of genetic diversity, to improve soybean production [7,8]. To date, it is commonly believed that wild soybean contains important genes for the adaptation to different harsh environments stressed by salt and insects. It is naturally expected that these genes in wild soybean can be re-introduced into domesticated soybeans due to the lack of breeding barrier between wild and cultivated soybeans.
As an important resource to improve the beneficial characteristics in domesticated soybeans, wild soybean (Glycine soja Siebold and Zucc.) contains high genetic diversity. It has been suggested that wild soybean is native to East Asia and is distributed along a broad geographic range. In China, soybean was domesticated BC 6000-9000 years ago [9] with a wide distribution of wild soybeans. In particular, soybean is one of the main crops in Heilongjiang Province located in the north of China, harboring a large number of wild soybean resources duo to its unique geographical and ecological environment. In order to advance the soybean industry, the sustainable utilization of wild soybean resources will provide the foundation for soybean innovation and production in this area. The comprehensive evaluation of agronomic traits of wild soybean will enrich the genetic basis of soybean cultivation, make breakthrough in soybean breeding, enhance the ability of soybean stress resistance, ensure the sustainable development of the soybean industry, and protect and effective use of wild soybean resources [10][11][12]. In this review, we investigated the agronomic traits of 242 wild soybean samples, providing a strong foundation for future application of wild soybeans, to improve the quality of cultivated soybeans.

Plant Materials and Field Experiments
A total of 242 wild soybean (G. soja) germplasms were used in this study. The samples collected from 13 cities of Heilongjiang Province were geographically distributed in four regions, namely the northern (Region I), eastern (Region II), southern (Region III), and western (Region IV) sites of Heilongjiang Province (Table 1 and Figure 1). These four regions were categorized based on their characteristics of topography, soil, and climate. Specifically, Region I showed a cool and humid climate, which was characterized by wide and rounded mountains with wide and shallow river valleys. Region II belonged to an agricultural and pastoral area in a low and flat terrain with fertile soil and plenty of water resources. Region III contained complex landforms with various vegetation types, crisscross agriculture and forestry, and abundant water resources. In Region IV, the special type of topography and soil conditions were not conducive for tree growth with the meadow steppe as the main vegetation type.  Based on the information of four major agricultural regions in Heilongjiang Province, all wild soybean resources were collected and preserved by the field investigation (Table  1). Seeds were obtained by self-breeding. The growth habits of wild soybean were different from those of the cultivated soybeans, including stolon habits, high plant height, and thin stems. Thus, in this study, a completely randomized block design (i.e., a 0.65 m × 1 m plot containing four holes) was utilized with three replications for each sample. One seedling was sown in each hole. About 2 weeks after germination, routine management was performed throughout the experiment. To avoid plant twist, the growth of each plant was supported by bamboo sticks.
All experiments were carried out at the Academy of Agricultural Sciences National Agricultural Demonstration Zone of Heilongjiang Province, in the summers of 2012 and 2013. A total of 14 agronomic traits were investigated on the soybean samples ( Figure 2). The six qualitative traits were evolutionary type ("wild" as recognized by 100-seed weight less than or equal to 3 g with main stem and "semi-wild" identified as 100-seed weight > 3.01 g without main stem), flower color (white and purple), leaf shape (needle, oval, ellipse, and linear), seed color (yellow, green, black, brown, and dichromatism), hilum color (black and brown), and bloom habit (mud film, mud-free film, and lustrous). The eight quantitative traits collected at maturity were 100-seed weight (g), number of seeds per plant, number of effective pods (with seeds) per plant, number of invalid pods (without seeds) per plant, number of branches (with more than two mature pods) per plant, number of nodes (from cotyledon to apex of plant) per plant, and internode length (cm; between two adjacent nodes). Based on the information of four major agricultural regions in Heilongjiang Province, all wild soybean resources were collected and preserved by the field investigation (Table 1). Seeds were obtained by self-breeding. The growth habits of wild soybean were different from those of the cultivated soybeans, including stolon habits, high plant height, and thin stems. Thus, in this study, a completely randomized block design (i.e., a 0.65 m × 1 m plot containing four holes) was utilized with three replications for each sample. One seedling was sown in each hole. About 2 weeks after germination, routine management was performed throughout the experiment. To avoid plant twist, the growth of each plant was supported by bamboo sticks.
All experiments were carried out at the Academy of Agricultural Sciences National Agricultural Demonstration Zone of Heilongjiang Province, in the summers of 2012 and 2013. A total of 14 agronomic traits were investigated on the soybean samples ( Figure 2). The six qualitative traits were evolutionary type ("wild" as recognized by 100-seed weight less than or equal to 3 g with main stem and "semi-wild" identified as 100-seed weight > 3.01 g without main stem), flower color (white and purple), leaf shape (needle, oval, ellipse, and linear), seed color (yellow, green, black, brown, and dichromatism), hilum color (black and brown), and bloom habit (mud film, mud-free film, and lustrous). The eight quantitative traits collected at maturity were 100-seed weight (g), number of seeds per plant, number of effective pods (with seeds) per plant, number of invalid pods (without seeds) per plant, number of branches (with more than two mature pods) per plant, number of nodes (from cotyledon to apex of plant) per plant, and internode length (cm; between two adjacent nodes).

Data Processing and Statistics
The values of the variation coefficient, minimum, maximum, average, and genetic diversity index of each trait were calculated by Excel software. The genetic diversity index was calculated based on Shannon-Weave index H' = −ΣPiLnPi [12].. Correlation analysis, principal component analysis and cluster analysis were carried out by SPSS software. Significant differences were evaluated by using one-way ANOVA and Duncan's test at p ≤ 0.05.

Statistical Analysis of Qualitative Traits
The information of qualitative traits on 242 wild soybean samples are listed in Table  2 and Supplementary Materials Table S2. These results indicated that the abundant traits included wild type (78.51%), purple flower (90.50%), needle leaves (39.26%), black seed color (83.88%), brown hilum (52.07%), and mud film (87.60%). The diversity index varied from 0.31 (flower color) to 1.14 (leaf shape), with an average of 0.62.

Data Processing and Statistics
The values of the variation coefficient, minimum, maximum, average, and genetic diversity index of each trait were calculated by Excel software. The genetic diversity index was calculated based on Shannon-Weave index H' = −ΣPiLnPi [12].. Correlation analysis, principal component analysis and cluster analysis were carried out by SPSS software. Significant differences were evaluated by using one-way ANOVA and Duncan's test at p ≤ 0.05.

Statistical Analysis of Qualitative Traits
The information of qualitative traits on 242 wild soybean samples are listed in Table 2 and Supplementary Materials Table S1. These results indicated that the abundant traits included wild type (78.51%), purple flower (90.50%), needle leaves (39.26%), black seed color (83.88%), brown hilum (52.07%), and mud film (87.60%). The diversity index varied from 0.31 (flower color) to 1.14 (leaf shape), with an average of 0.62.
In addition, we also calculated the percentage of different qualitative traits of 242 samples from all four sites (Table 3). The samples from the northern site contained mainly purple flower (96.59%), needle leaf (67.05%), black seed color (98.86%), brown hilum color (62.5%), and mud film (96.59%), while most accessions from the western site showed purple flower (97.44%), ellipse leaf (53.85%), black seed color (97.44%), black hilum color (61.54%), and mud film (97.44%). The germplasms from the eastern site were dominated mainly by purple flower (84.31%), ellipse leaf (49.02%), black seed color (68.63%), brown hilum color (52.94%), and mud film (76.47%). Most samples from the southern site contained purple flower (82.81%), oval leaf (53.13%), black seed color (67.19%), brown hilum color (52.69%), and mud film (78.13%).    (Table 4 and Supplementary Materials Table S2). A total of five agronomic traits (i.e., 100-seed weight, seed weight per plant, number of seeds per plant, number of effective pods, and number of invalid pods) were significantly different between samples (p < 0.05) ( Table 5). Samples in the southern site showed the highest 100-seed weight (3.26 g), seeds weight per plant (30.03 g), and number of branches (6.00 g). In contrast, the northern site was characterized by the lowest 100-seed weight per plant (1.62 g), seed weight per plant (11.  Means followed by the same letters in the same line are not significant at p > 0.05, while means followed by different letters in the same row are significant at p < 0.05.

The Principal Component, Cluster and the Correlation Analysis
The results of principal component analysis (PCA) based on all agronomic traits were shown in Figure 3 and Table 6 (Table 7 and Figure 3). The average of variation in Groups I and II was 50.92% and 41.30%, respectively. The values of number of effective pods per plant, number of seeds per plant, and number of nodes in Group II were significantly higher than those of Group I, respectively.
The correlation coefficients between all agronomic traits are listed in Table 8. No significant correlation was observed between 100-seed weight and number of effective pods per plant (r = 0.04) and between 100-seed weight and number of nodes (r = −0.55). The 100-seed weight showed significantly negative correlation with number of seeds per plant (r = −0.18). Most of other agronomic traits showed significantly positive correlations with each other (p < 0.01), except for between number of branches and number of seeds per plant (r = 0.15, p < 0.05) and between number of seeds per plant and internodes length (r = 0.16, p < 0.16), indicating that a close linkage occurred between these agronomic traits. Means followed by the same letters in the same line are not significant different (p > 0.05), while means followed by different letters in the same row are significant different (p < 0.05).

Mining and Screening of Specific Germplasms
A total of 27 specific germplasms were screened, based on eight agronomic traits (Table 9)

Discussion
The study of genetic diversity is important and necessary for understanding gene flow, evolutionary history, and time of domestication of crop plants [13]. In this study, our objective was to identify important phenotypic characteristics of wild soybeans from Heilongjiang Province for breeding of soybeans. Genetic diversity provides useful information about the diversity of plant species and greatly helps us systematically study crop breeding and understand evolutionary relationships [14]. Furthermore, understanding of the genetic diversity of wild germplasms can reveal the genetic structure, evolution, and geographical distribution of crops, and identify social factors that guide the breeding of cultivated crops. Moreover, wild germplasms with high genetic variations ultimately provide excellent genetic resources to improve the beneficial characteristics of cultivated soybeans [15]. Generally, the wild soybean is considered to be the direct progenitor of cultivated soybeans, suggesting the importance of wild soybeans in breeding programs of soybean improvement [16].
The genetic diversity of the wild soybeans has been studied based on morphological traits [17][18][19]. In this study, we investigated the genetic diversity of 242 wild soybean samples from four geographic sites in Heilongjiang Province, based on both qualitative and agronomic traits, in order to advance our understanding of wild soybean germplasms and their conservation strategies. All wild habitats of soybeans were considered in the unique collecting sties. Previous studies have explored the genetic diversity of wild soybeans, based on limited samples in Heilongjiang Province, based on ecological characteristics [18,20]. These results showed rich genetic diversity in wild soybeans, which could be used to improve the characteristics of cultivated soybeans. Similarly, other studies have analyzed the genetic diversity of the wild soybean from Heilongjiang Province, but based on limited populations and lacking sufficient experimental replicates [21,22]. In our study, we investigated the genetic diversity of wild soybeans based on a comprehensive dataset of a total of 242 samples collected in four agricultural sites from Heilongjiang province, using six qualitative traits and eight quantitative traits, for two consecutive years (2012 and 2013) at the Academy of Agricultural Sciences National Agricultural Demonstration Zone of Heilongjiang Province. The results of clustering based on PCA showed that all samples were divided into two groups, suggesting that some samples were clustered together, but more mixed samples were also clustered together (Figure 3). These results may be attributed to the fact that the agronomic variations were regulated by gene, environment, and gene-environment interactions. Furthermore, the size of samples and the collecting locations can also lead to the differentiation of the wild soybean populations.
The complicated geographic environment is formed in Heilongjiang Province, stretching across 10 longitudes and 14 latitudes, locally harboring a variety of ecotypes of wild soybeans, due to its unique ecological conditions in this region [23]. Studies have shown that the high genetic diversity of wild soybeans derived from long-term selection has generated the adaptation of wild soybeans in various types of ecological habitats [24]. In the present study, results showed that higher values of several agronomic traits (i.e., 100-seed weight, seed weight per plant, number of seeds per plant, number of effective pods, number of invalid pods, number of nodes, and internode length) were observed in 2012 than those in 2013. These results may be attributed to the low temperature and wet weather in 2013, which negatively affected the growth of wild soybeans during growing seasons [25]. In this study, the morphological traits of wild soybean samples were compared among the samples collected from four geographical distribution of agriculture on wild soybean (Agricultural Division Office of Heilongjiang Province, 1985). Results showed that most of samples contained needle leaf, small seed, no evident main stem, low 100-seed weight, and the high diversity index, while these traits are generally considered as typical wild characteristics. These results are consistent with those reported previously [26]. The low genetic diversity of qualitative traits in 242 soybean samples was suggested by the diversity index H of 0.62 (Table 2), which is lower than that reported previous [27], while significant difference was revealed in qualitative traits among four agricultural sites, suggesting that strong association between the geographic distribution and genetic clustering (Table 3). Specifically, the northern site of Heilongjiang Province is located in the Da Xing An Ling and Xiao Xing An Ling regions, where the mountains are round and vast, while the climate is cold and the natural environments are scarcely populated. Compared with other sites, the northern site is characterized as having a shorter growing season for crops, due to limited terrain, climate conditions, and human factors [28]. It is expected that the northern site may contain rich wild soybean resources. This expectation is supported by our results, showing rich wild soybean resources in the northern site. Therefore, it is recommended that the wild soybean resources in the northern site should be protected in situ. Previous studies have shown that Southern China might be the major center of diversity of the annual wild soybeans, simply due to frequent artificial partitions in wild soybeans among the southern provinces, in addition to natural selection [29]. Furthermore, the wild soybeans in other three collecting sites showed ellipse and oval leaves with big seeds, white flowers, and a main stem, which may be attributed to the rapid agricultural development and human intervention accelerating the evolution of these wild soybeans, as reported previously [26].
In the past decade, the selection and conservation of wild soybeans have attracted increasing attention worldwide, particularly because they are highly adaptive to various harsh environments compared to cultivated soybeans [27]. It is well-known that wild soybeans contain the sufficient genetic variations to adapt to many geographic, abiotic, and biotic environmental conditions [29]. Therefore, the wild soybean germplasms are generally considered as a potential genetic bank for cultivated soybeans, to improve their coping with climate change in the future. Our results showed that several agronomic traits were positively correlated with each, such as number of effective pods per plant and seed weight per plant, number of seeds per plant and number of effective pods per plant, and 100-seed weight and seed weight per plant, indicating the strong association between these agronomically beneficial traits; that is, these traits are linked to high yield of soybeans [30]. Thus, it is recommended that these traits be selected for future soybean-yield-enhancement breeding program.
The genetic variations have been established in the course of evolution, in response to adaptation to the environmental changes, generating some typical phenotypic traits, which are, in turn, used as fundamental standards for investigation of genetic diversity and screening of desired soybean resources [31]. For example, significant phenotypic variations in wild soybeans have been identified in several southern provinces, including Jiangsu, Henan, Shanxi, and Anhui [32][33][34]. indicating the existence of the rich pool of wild soybean resources in China. In our study, continuous variations in the phenotypic traits were identified in the same agricultural site, showing different degrees of variation with the average variation was of 59% and 94% in the years of 2012 and 2013, respectively, while the variation coefficients ranged from 21% (internode length) to 94% (seeds weight per plant). These values are much higher than those reported previously, based on a total of 210 wild soybean samples from Jiangsu Province, with the average variation of 26.68% and the variation coefficients ranging from 10.00% to 49.54% [32]. Our results showed that the average diversity index was 1.83 and 1.77 in the years of 2012 and 2013, respectively, with the individual of diversity index ranging from 1.73 (number of invalid pods) to 2.09 (number of effective pods). These values are much higher than those reported previously, based on 305 wild soybeans from Henan Province, with the individual diversity index ranging from 0.1187 to 1.0903 [33]. These results indicate that the wild soybeans in Heilongjiang Province contain high genetic diversity. Furthermore, results of agronomic traits showed that the high values were identified in 100-seed weight, seed weight per plant, number of effective pods, and number of branches among the wild soybeans in eastern, western, and southern sites, indicating that wild soybeans from multiple agricultural sites of Heilongjiang Province contain rich genetic diversity and suggesting their potential utilization in genetic improvement of soybean breeding. Based on these traits, a total of 27 wild soybean samples were screened and would be used as soybean breeding resources for gene identification and soybean improvement.
Results of clustering analysis based on PCA showed that all wild soybean samples were divided into two groups (Groups I and II), with each containing samples from all four collection sites, suggesting that some mixed samples were clustered together. These results may be attributed to the fact that the agronomic variations were regulated by various factors, including genotype, environment, and gene-environment interaction. Furthermore, the size of samples and the collecting locations can also lead to the differentiation of wild soybean populations. Group II was characterized mainly by having higher values of number of effective pods per plant, number of seeds per plant, and number of nodes, while Group I was characterized by 100-seed weight. These results indicate that there is no significant correlation between the wild soybean samples and collection sites, not consistent with and following the patterns of agricultural and ecological characteristics of Heilongjiang Province. The results also suggested that the geographical distance is not the most important factor causing the genetic differentiation of wild soybeans or frequent gene flow occurred in the same agricultural site of Heilongjiang Province. Therefore, these wild soybean resources should be locally conserved in situ.

Conclusions
In this study, a total of 242 wild soybean samples were evaluated based on 14 traits, showing a large amount of variation occurred in these evaluated traits. It was demonstrated that the wild soybeans from different sites of Heilongjiang Province contained rich genetic diversity, with significant potential utilization in genetic improvement and breakthrough for soybean breeding. Therefore, it is recommended that these wild soybean resources should be conserved in situ.