Genetic Diversity and Population Structure Analysis of Castanopsis hystrix and Construction of a Core Collection Using Phenotypic Traits and Molecular Markers

Castanopsis hystrix is a valuable native, broad-leaved, and fast-growing tree in South China. In this study, 15 phenotypic traits and 32 simple sequence repeat (SSR) markers were used to assess the genetic diversity and population structure of a natural population of C. hystrix and to construct a core germplasm collection by a set of 232 accessions. The results showed that the original population of C. hystrix had relatively high genetic diversity, with the number of alleles (Na), effective number of alleles (Ne), observed heterozygosity (Ho), expected heterozygosity (He), Shannon’s information index (I), and polymorphism information content (PIC) averaging at 26.188, 11.565, 0.863, 0.897, 2.660, and 0.889, respectively. Three sub-populations were identified based on a STRUCTURE analysis, indicating a strong genetic structure. The results from the phylogenetic and population structures showed a high level of agreement, with 232 germplasms being classified into three main groups. The analysis of molecular variance (AMOVA) test indicated that 96% of the total variance was derived from within populations, which revealed a low differentiation among populations. A core collection composed of 157 germplasms was firstly constructed thereafter, of which the diversity parameters non-significantly differed from the original population. These results revealed the genetic diversity and population structure of C. hystrix germplasms, which have implications for germplasm management and genome-wide association studies on C. hystrix, as well as for core collection establishment applications in other wood-producing hardwood species.


Introduction
Information on genetic diversity is important for understanding the extent of genetic variability in existing plant material and the breeding and conservation of genetic resources [1,2]. However, tree breeding usually involves the recurrent selection of genetically superior materials and possibly results in altered diversity levels in breeding populations [3]. Various types of markers can be used for genetic diversity estimation. In the past, phenotypic traits were widely used for assessing genetic diversity; however they are influenced by the environment and cannot be accurately evaluated. In recent decades, DNA molecular markers have been increasingly exploited for genetic diversity. They can be employed to investigate levels of genetic diversity among categories such as cultivars and closely related species in germplasm banks [4,5].

Experimental Materials
The germplasm gene bank was established in 2003. Since 1999, the C. hystrix Genetic Improvement Research Collaboration Group, composed of the Guangdong Academy of Forestry and other institutions, has systematically investigated, selected, and sampled superior trees from 17 provenance areas with relatively concentrated distributions of C. hystrix germplasm resources according to the distribution of existing C. hystrix resource. Furthermore, the group has also carried out the collection of C. hystrix germplasm resources. Superior trees were selected by the method of five dominant trees, and one tree was selected for every 30-50 m. The number of superior trees selected from each producing area was generally 10-30 trees according to the distribution area. After selecting excellent trees, we collected seeds and the sunny branches of the specific year in the south side of the middle and upper part of the crown from October to November 2001. After sending them directly to the grafting site, 30-50 plants were grafted on each excellent tree. The grafted trees were transplanted in the gene bank in the spring of 2003, and five plants were planted in each excellent tree; the plant row spacing was 3 m × 5 m, and the size of the hole was 60 cm × 60 cm × 40 cm.
A set of 232 accessions of C. hystrix were mainly collected from the whole range (Figure 1), which mainly comprised 17 provenances, including Guangxi (5 accessions), Guangdong (4 accessions), Fujian (3 accessions), Hainan (2 accessions), Yunnan (2 accessions), and Hunan (1 accessions). At present, all of them are preserved in the Maofeng Mountain C. hystrix germplasm gene bank (113° 46′ E and 23° 29′ N), Baiyun District, Guangzhou City, Guangdong Province. From November to December 2018, 3 trees were selected for each clone, and 15 phenotypic traits were investigated, including 5 growth traits, 2 morphological traits, and 8 wood properties (Table S1); the detailed passport data is presented below ( Table 1). The laboratory had completed the DNA extraction and SSR genotyping in the early stage. Finally, 32 pairs of SSR markers were chosen (Tables S2 and  S3). For the specific operation methods and steps of the test, refer to the paper by Yang [35].

Population Structure, Principal Coordinate Analysis, and Evolutionary Tree Analysis
An analysis of molecular variance (AMOVA) test was carried out to determine the relative partitioning of the total genetic variation among and within different groups of genotypes by using GenAlEx 6.5. The principal coordinate analysis (PCoA) was also performed using GenAlEx v6.5. The genetic structure of unique genotypes was investigated using STRUCTURE v2.3.4 software [41] using an admixed model with 10,000 burn-ins followed by 10,000 iterations. Markov Chain Monte Carlo iterations were run for 20 cycles of a number (K = 1-10) of genetically homogeneous clusters. The most probable K value was determined with the highest ∆K method [42] in STRUCTURE HARVESTER v0.6 software [43] and used for the estimation of the membership coefficient of each clone. The web tool iTol (https://itol.embl.de/ (accessed on 2 July 2021)) was used for data visualization. Additionally, to analyze the relationships of the 232 germplasms, a genetic distance matrix between the clones was generated, and an unrooted phylogenetic tree was constructed using the neighbor-joining method in PowerMarker 3.25 software [44].
Core Finder v1.1 [46] and Core Hunter 3 [47] software were used to establish a core collection according to the molecular markers data; the latter software set various sampling ratios. The independent t-test was used to analyze the significance of differences in genetic diversity parameters between the core collection and the original collection [48]. If the differences were not significant, the constructed core collection was considered to be representative of the original collection. The principal coordinates analysis (PCoA) was performed to generate the distribution map of the core and original collection to evaluate the core collection.

Genetic Diversity Analysis of the Original Population
A total of 335 alleles were detected by 32 polymorphic microsatellite markers, with an average of 10.458 alleles detected at each locus, among which 15 alleles were detected at the SSR04 locus. The alleles of SSR02, SSR10, SSR15, SSR18, SSR23, and SSR27 were the same and least, having only eight alleles ( Table 2). The average number of effective alleles (Ne) at all loci was 7.115, ranging from 3.832 to 10.931. The average Shannon diversity index (I) was 2.035, ranging from 1.597 to 2.469. The average observed heterozygosity (Ho) was 0.861, ranging from 0.736 to 0.990. The average expected heterozygosity (He) of the 32 loci was 0.824, ranging from 0.704 to 0.894, indicating that the C. hystrix had a high level of diversity. The average polymorphism information index (PIC) was 0.889, ranging from 0.744 to 0.958. According to the standard of PIC ≥ 0.5 [49], all of the above loci showed a high polymorphism. The average coefficient of genetic differentiation (Fst) of all loci in this study was 0.081, ranging from 0.056 to 0.138, indicating slight genetic differentiation in the loci. The average value of gene flow (Nm) was 2.948, ranging from 1.561 to 4.200, indicating that the gene flow greatly fluctuated in different loci, which indicates that there was a high degree of gene communication among the C. hystrix populations. The results showed that the genetic diversity of these SSR loci was generally high, and the highest genetic diversity was SSR04 while the lowest genetic diversity was SSR10, observed by combining the values of each genetic parameter. In order to analyze the genetic diversity of 17 C. hystrix populations, the genetic parameters, containing Na, Ne, Ho, He, and I were calculated, respectively ( Table 3). The average number of alleles (Na) was 10, ranging from 3 to 15. The average number of effective alleles (Ne) was 7.115, ranging from 2.821 to 9.519. The average Shannon diversity index (I) was 2.035, ranging from 1.025 to 2.395. The mean observed heterozygosity (Ho) was 0.861, ranging from 0.766 to 0.923. The average expected heterozygosity (He) was 0.824, ranging from 0.609 to 0.876. These results indicated that the overall genetic diversity of C. hystrix was high and that the level of variation was rich.

Population Structure of the Original Population
The population structure of the 232 accessions was estimated using STRUCTURE software based on the 32 SSR markers. Firstly, the number of subpopulations (K) was identified based on the maximum likelihood and DK values; the results showed that the DK value reached the highest when K = 3, which indicated that the whole population was divided into three subgroups ( Figure 2). The three subgroups were designated as Q1, Q2, and Q3 (indicated in red, green, and blue, respectively, in Figure 3). At K = 3, the division was as follows: group I included 100 accessions; group II contained 57 accessions; and group III contained 75 accessions (Table S4, Figure S1). There was an admixture that occurred between the clusters, indicating that there was a certain degree of gene exchange among the populations.
A principal component analysis was performed to create a three-dimensional scatter plot using the data of the SSRs identified in the 232 C. hystrix germplasms to visualize the relationships between genotypes. A three-dimensional graph was created based on the value of each sample in the first (PC1), second (PC2), and third (PC3) principal components ( Figure 4). The first, second, and third principal components explained 8.4%, 6.3%, and 4.8% of the total genetic variability, respectively. The scattered dots of different colors in the PCA figure represent samples of different populations, and the results show that the accessions are clustered together, indicating that the differences of these accessions are small. group III contained 75 accessions (Table S4, Figure S1). There was curred between the clusters, indicating that there was a certain deg among the populations.  A principal component analysis was performed to create a three-d using the data of the SSRs identified in the 232 C. hystrix germplasms to ships between genotypes. A three-dimensional graph was created base sample in the first (PC1), second (PC2), and third (PC3) principal comp first, second, and third principal components explained 8.4%, 6.3%, an netic variability, respectively. The scattered dots of different colors in th samples of different populations, and the results show that the acces gether, indicating that the differences of these accessions are small.  (Table S4, Figure S1). There was an admixture that occurred between the clusters, indicating that there was a certain degree of gene exchange among the populations.  A principal component analysis was performed to create a three-dimensional scatter plot using the data of the SSRs identified in the 232 C. hystrix germplasms to visualize the relationships between genotypes. A three-dimensional graph was created based on the value of each sample in the first (PC1), second (PC2), and third (PC3) principal components ( Figure 4). The first, second, and third principal components explained 8.4%, 6.3%, and 4.8% of the total genetic variability, respectively. The scattered dots of different colors in the PCA figure represent samples of different populations, and the results show that the accessions are clustered together, indicating that the differences of these accessions are small. A neighbor-joining analysis was performed, and the 232 germplasms were classified into three main groups, designated as groups I, II, and III ( Figure 5). The distribution of the C. hystrix germplasms in the inferred groups is shown in Table S5. The first main group (I) included 61 germplasms. The second main group (II) included 68 germplasms. The third main group (III) included 103 germplasms. From a geographic origin perspective, some of the germplasms from the same geographic origin clustered in the different group (Table S4). This shows that the three groups classified by phylogenetic analysis contained germplasms from different geographical locations. The neighbor-joining dendrogram based on the genetic distance between individual trees was used to determine the genetic relationship among C. hystrix accessions, and a similar result of structure analysis at K = 3 was obtained.
In order to understand the level of genetic differentiation and reflect the source of variation among C. hystrix populations, the source of variation was divided into two levels (between different populations and within populations), and an AMOVA analysis of molecular variance (Table 4) was performed on the C. hystrix populations. The results showed that the genetic variation of C. hystrix populations mainly came from individuals within the population, and most of the variation was within populations (96%), whereas 4% of the variation was between populations.  A neighbor-joining analysis was performed, and the 232 germ into three main groups, designated as groups I, II, and III (Figur the C. hystrix germplasms in the inferred groups is shown in Table (I) included 61 germplasms. The second main group (II) includ third main group (III) included 103 germplasms. From a geogra some of the germplasms from the same geographic origin clustere (Table S4). This shows that the three groups classified by phyloge germplasms from different geographical locations. The neighb based on the genetic distance between individual trees was used t relationship among C. hystrix accessions, and a similar result of st was obtained.  A neighbor-joining analysis was performed, and the 232 germplasms into three main groups, designated as groups I, II, and III ( Figure 5). The the C. hystrix germplasms in the inferred groups is shown in Table S5. The fi (I) included 61 germplasms. The second main group (II) included 68 ger third main group (III) included 103 germplasms. From a geographic orig some of the germplasms from the same geographic origin clustered in the d (Table S4). This shows that the three groups classified by phylogenetic ana germplasms from different geographical locations. The neighbor-joinin based on the genetic distance between individual trees was used to determ relationship among C. hystrix accessions, and a similar result of structure a was obtained.

Core Collection Establishment and Evaluation
Using QGAStation software to construct core collection based on phenotypic data, a total of 126 core collections were constructed. Among the 126 core collections, three with CR < 80% were removed ( Figure 6, Table S6); three core collections had a CR of 10%. Eventually, 123 core collections had a mean difference (MD) of <20% and coincidence rate of range (CR) > 80%, indicating that these 123 core collections are good representations of the genetic diversity of the original collection. According to the maximum variance difference (VD) and rate of variation in the coefficient of variation (VR) values at each sampling ratio (Table 5), we found that the VR had maximum values at the 10% sampling ratio, and the CR and VD had maximum values at the 15% sampling ratio. With the increase in sampling proportion, the VR gradually decreased. Therefore, 15% is the optimal sampling ratio. Under the preferred sampling method (D3), the CR of the core collection constructed was 100%, but the CR constructed by the deviation sampling method (D2) was lower than 100%. Therefore, the best sampling method is the preferred sampling method (D3). At the 15% sampling ratio, the VD value at C3 was greater than C2, and the VR value was similar; thus, the best clustering method was determined to be the mediate distance method (C3). We conclude that the core collection generated from the 15% sampling ratio (B2C3D3) is the best core collection, which has 32 clones.
Genes 2022, 13, x FOR PEER REVIEW 10 of 18 Figure 6. Percentage of trait differences between the core collections and the initial collection obtained by different combinations. B1 and B2 represent Euclidean distance and Mahalanobis distance, respectively; C1, C2, and C3 represent the unweighted pair-group average method, Ward's method, and mediate distance method in the systematic clustering, respectively. D1, D2, and D3 represent the random sampling, deviation sampling, and preferred sampling methods, respectively. Seven sampling ratios of 10%, 15%, 20%, 25%, 30%, 35%, and 40% were set. The percentage of the mean difference (MD), percentage of the variance difference (VD), coincidence rate of the range (CR) and rate of variation in the coefficient of variation (VR). * represents the maximum for each sampling Figure 6. Percentage of trait differences between the core collections and the initial collection obtained by different combinations. B1 and B2 represent Euclidean distance and Mahalanobis distance, respectively; C1, C2, and C3 represent the unweighted pair-group average method, Ward's method, and mediate distance method in the systematic clustering, respectively. D1, D2, and D3 represent the random sampling, deviation sampling, and preferred sampling methods, respectively. Seven sampling ratios of 10%, 15%, 20%, 25%, 30%, 35%, and 40% were set. The percentage of the mean difference (MD), percentage of the variance difference (VD), coincidence rate of the range (CR) and rate of variation in the coefficient of variation (VR). * represents the maximum for each sampling ratio in the figure. Core Finder software was used to analyze the SSR data with the M strategy based on the principle of maximizing alleles. The core collection Mc1 having a sample size of 158 retained 100% of the allele number of the original collection and had increased genetic parameters (Ne, He, I, and PIC); thus, Mc1 with its strong genetic diversity should be a good representation of the original collection. In addition, Core Hunter 3 was used to determine the optimal size of the core collection using 10 preset sampling ratios ( Table 6). The six genetic diversity parameters (Na, Ne, Ho, He, I, and PIC) of the core collection and the original collection were compared. We found that the core collections with a sampling ratio of less than 50% had significantly different Na and Ho values than the original collection; thus, the optimal sampling ratios is 55% (H-55, referred to henceforth as Mc2). To further determine the better core collection between Mc1 and Mc2, we analyzed the Ne, I, Ho, He, and PIC of the constructed core collection ( Table 7). The core collection Mc1 and Mc2 preserved 68.1% and 55.2% of the original collection resources, respectively. The t-test results showed that the six genetic diversity parameters of core collection Mc1 and Mc2 were not significantly different from those of the original collection, indicating that both core collections could be a good representation of the original population. Based on the values summarized in both Tables 6 and 7, the Ne, He, I, and PIC values of Mc1 obtained by Core Finder were all higher than those of the original collection. This indicates that the genetic redundancy in the original collection was removed from Mc1 and the corresponding genetic diversity parameters were increased. Only the He and PIC values of Mc2 obtained by Core Hunter 3 were higher than those of the original collection, and the reservation rates of all six genetic parameters were lower than those of Mc1. Moreover, the proportion of Na retention of Mc1 generated by Core Finder was 99.9%, and the retention rates of Ne, He, I, and PIC were all above 100%; PIC ≥ 0.5 had high polymorphism, indicating it is a good core collection. Therefore, the best core collection was identified as Mc1, which has 158 clones.  Finally, a total of 157 clones were obtained by combining the phenotypic core collection (B2C3D3-15) and molecular core collection (Mc1) into the final core set (BM). In order to check if the BM could effectively represent the genetic diversity of the whole germplasm, a principal coordinates analysis (PCoA) was used to generate a distribution map of the core and original collection with SSR data (Figure 7). The results showed that the distribution of the core collection and original collection basically coincides in the middle part, indicating that this part of the core collection is a good representation of the original collection. However, there was deviation in the upper right and lower right part. Finally, a total of 157 clones were obtained by combining the phenotypic core collection (B2C3D3-15) and molecular core collection (Mc1) into the final core set (BM). In order to check if the BM could effectively represent the genetic diversity of the whole germplasm, a principal coordinates analysis (PCoA) was used to generate a distribution map of the core and original collection with SSR data (Figure 7). The results showed that the distribution of the core collection and original collection basically coincides in the middle part, indicating that this part of the core collection is a good representation of the original collection. However, there was deviation in the upper right and lower right part.

Genetic Diversity of C. hystrix Germplasm Resources
Characterizing breeding collection germplasms is crucial in plant breeding as the genetic advancement of economically valuable traits relies on the genetic diversity available within the breeding gene pool. Learning about genetic diversity also assists in minimizing the use of closely related clones as parents in breeding programs. Genetic diversity is an

Genetic Diversity of C. hystrix Germplasm Resources
Characterizing breeding collection germplasms is crucial in plant breeding as the genetic advancement of economically valuable traits relies on the genetic diversity available within the breeding gene pool. Learning about genetic diversity also assists in minimizing the use of closely related clones as parents in breeding programs. Genetic diversity is an integral part of all biological diversity; it is the basis of biological evolution and species differentiation and is of great significance for population maintenance, reproduction, and adaptation to habitat changes. The higher the genetic diversity, the more likely a population is to adapt to different environments, and variations in DNA sequences are the primary drivers of such diversity [50]. Molecular markers provide powerful tools for genetic diversity analyses and the establishment of core collections. In recent years, various molecular markers such as RAPD, SSR, and ISSR have been used to study the analysis of phylogeny, inter-species relationships, and genetic diversity of forest species including Pinus leucodermis, Eucalyptus globulus, Swietenia macrophylla, and Populus deltoides [51][52][53][54]. It has been reported that SSRs are abundant and ubiquitous in prokaryotic and eukaryotic genomes [55,56]. SSRs offer high-resolution markers to breeding programs far beyond the traditionally used approaches solely depending on pedigrees [7] or phenotypic data [57]. Consequently, SSRs have become the most popular marker.
C. hystrix is a precious local wood and an efficient multi-purpose fast-growing tree species in South China. Genetic diversity, population structure, and molecular markers knowledge may accelerate the selection of desirable traits in C. hystrix. Nei's gene diversity, the observed heterozygosity and expected heterozygosity, the Shannon-Wiener index, the polymorphism information content, etc., have all been used to evaluate the level of genetic diversity of plant species [50]. The high number of alleles obtained in some studies may be due to the use of a large amount of highly diversified plant material [58,59] as well as the high number of samples employed in the analysis. In the present study, a total of 335 alleles was revealed using 32 SSR markers, with an average 10 alleles per locus, revealing a high level of variability within a sample set. This high average of alleles per locus can be attributed to the high genetic diversity in the investigated genotypes. The PIC value affords a fairer estimation of diversity than the actual number of alleles because it takes into account the relative frequencies of each allele present [60,61]. In our study, the overall average PIC for the SSR loci value was 0.889. All SSRs had PIC values ranging from 0.744 to 0.958. The SSRs having PIC values ranging from 0.25 to 0.5 are considered moderately informative [49]. This result was also reported for Xanthoceras sorbifoliai, and higher PIC and genetic diversity scores were reported in studies using SSRs [62]. According to the genetic diversity of the 17 C. hystrix populations, the P2 population (Bobai, Guangxi) had the highest genetic diversity, while the P13 population (Jianghua, Hunan) had the lowest genetic diversity. The genetic diversity of the P13 population was significantly lower than that of the other populations, which may be due to the small number of samples. As previously demonstrated, the SSR assay approach is appropriate for genetic relationship studies [63,64], and it proved to be an efficient tool for the assessment of the genetic diversity of C. hystrix and identification of its populations in China.

SSR-Based Genetic Relationships among C. hystrix Germplasm Resources
Population structure is an important component in association mapping analyses between molecular markers and traits. Differences in population genetic structures reflect genetic diversity and convey the adaptation potential of a species to its changing environment [65]. To understand the genetic relationships and population genetic structure of the C. hystrix germplasm at the genomic level, the SSR data of 232 germplasms, STRUCTURE analysis, UPGMA cluster analysis, and PCoA analyses were used to thoroughly investigate the genetic structure of C. hystrix. Based on the SSRs and multiple analyses, including population structure and phylogenetic analyses, it was confirmed that the within population was clearly clustered to three groups, which is more than previous studies using SSR and ISSR molecular marker analysis [27,66]. Both NJ and Bayesian model-based clustering studies failed to indicate any definitive clustering among the germplasm accessions. Although they were clustered into three groups, the results were somewhat different. This may be caused by different clustering methods. We found that the results of the phylogenetic analysis and genetic population analysis were basically consistent and complemented one another, but they are not completely clustered according to geographical origin. This is mainly because the elite germplasms used in this test were all selected and obtained from local gene pool. Moreover, in the long-term selection process, germplasms from different provinces were introduced or exchanged. Wang et al. [67] reported similar results in a study conducted on 119 Xanthoceras sorbifolium accessions.
The AMOVA analysis is a satisfactory grouping criterion for evaluating the variation within and among populations. Most scholars generally believe that the level of genetic diversity in woody plants with wide distribution, perennial, outcrossing, wind-borne seeds, or feeding by birds and animals is higher and that the genetic diversity within the populations is richer than that between populations [68][69][70]. In accordance with the genetic variation between and within the populations was significant (p < 0.001), the results indicated a greater within population variation (96%) than between populations (4%), and the genetic variation within populations was the primary source of the total variation. This indicates that there is little genetic differentiation among the populations, which matches the results recorded in previous studies [27,28]. The AMOVA results revealed that the population differentiation between the main genetic variation accounted for the largest proportion of genetic variability (Table 4), which was similar to that of previous studies by Li et al. [66] and Belaj et al. [71,72]. In conclusion, the genetic variation mainly came from the within population variation, the genetic diversity within the population was rich, and the genetic differentiation among the populations was small. This may be so for the following reasons: (1) C. hystrix is an outcrossing plant mainly pollinated by wind, its wide distribution provides opportunities for gene recombination and produces rich genetic diversity. At the same time, the gene exchange between populations was promoted and the differentiation between populations was reduced. (2) Plant genetic diversity is related to environmental adaptation [73]. The C. hystrix has a wide distribution range and strong adaptability. Under the action of long-term natural selection, a wide range of genetic variation has been produced, resulting in the formation of geographical provenances with different phenotypes and different requirements for environmental conditions. (3) The level of genetic diversity within populations was also influenced by the number of samples [74,75], and the genetic diversity within a population is proportional to the number of samples [76].

Core Collection Construction and Evaluation
In this study, the genetic diversity, population structure, population differentiation and, core collection of C. hystrix resources have been evaluated using SSR molecular markers, which identified rich genetic diversity among the C. hystrix germplasm within populations. Although the core collections of many trees have been established, the construction of a core collection of C. hystrix had not been conducted. According to the method of Hu [20], when MD% ≤ 20% and CR% > 80%, the core collection can be considered to recapitulate the genetic diversity of the original collection. The smaller the MD%, the larger the VD%, CR%, and VR%, and the more representative the core collection. The retention ratio of alleles should be greater than 70%, and the larger the other genetic parameters, the better [77,78]. We successfully established a core collection of C. hystrix with a 100% allelic representation based on 15 phenotypic traits and 32 SSR markers.
The rich diversity of different germplasm resources is detailed, the sampling ratio of core collection bank varied, and the sampling ratio of woody plants ranged from 10.00% to 45.00% [13,79,80]. In this study, seven sampling ratios were investigated (10%, 15%, 20%, 25%, 30%, 35%, and 40%), and the final sampling ratio of the best C. hystrix phenotype core collection was 15%, which is consistent with previous studies [81]. While phenotypic data can often reflect the genetic diversity of materials, perennial trees are vulnerable to environmental impacts. However, molecular marker technology has the advantages of low cost and fast data acquisition, and it is not affected by external factors. The core collection constructed by combining the genetic diversity and phenotypic variation of the original population [82][83][84] can improve the effectiveness of the constructed core collection.
Based on the M strategy, Core Finder software selects core collections by maximizing the number of alleles at each locus, which can eliminate genetic duplication in materials during construction and screen materials with a large number of alleles and low redundancy. Core Hunter 3 mainly screens the core collection based on maximizing genetic diversity and allele richness, and different sampling ratios can be set. In this study, a C. hystrix core collection was constructed using Core Finder and Core Hunter 3 software. The results showed that the retention rates of Core Finder in the four genetic parameters Na, Ne, Ho, and I were higher than those of Core Hunter 3 (Table 7). Moreover, the Ne, He, I, and PIC values of Mc1 obtained by Core Finder were higher than those of the original collection; this was expected as the diversity increases with the elimination of genetically similar accessions during core collection development [85]. In conclusion, Core Finder software is more suitable for the construction of the C. hystrix core collection. Consistent with this study, Gong et al. [86] constructed an astragalus core collection based on 380 astragalus samples using different methods, such as the M strategy-based method in Core Finder and stepwise sampling-based method in Core Hunter 3; the authors concluded that Core Finder software combined with the M strategy was the most suitable method for constructing the astragalus core collection. In this study, a core C. hystrix germplasm set, BM, was constructed based on 15 entries of phenotypic data and 32 SSR markers, which were composed of 157 C. hystrix accessions. The results of the principal component analysis showed that some of the core collection overlapped with the original collection and some were scattered around. The reason may be that when using Core Finder software to analyze SSR data and extract the core collection, the amount of data was too large. In the future, we can try to use other software to construct a core collection based on the SSR data and compare the results.

Conclusions
In this study, the genetic diversity, population structure, population differentiation, and core collection of C. hystrix resources have been evaluated using SSR molecular markers. The results showed that the genetic diversity of these SSR loci was rich. Moreover, C. hystrix samples were grouped into three clusters. We successfully established a core collection, BM, by combining 15 phenotypic data and 32 SSR molecular markers. We demonstrated that SSR markers were successful and effective for the assessment of the genetic diversity and structure of the C. hystrix populations. The established core collection can be used for future genome association analysis and breeding program research. This study provided a theoretical basis for germplasm resource management as well as the conservation and utilization of C. hystrix germplasm resources.