Genetic Diversity and Population Differentiation of Pinus koraiensis in China

: Pinus koraiensis is a well-known precious tree species in East Asia with high economic, ornamental and ecological value. More than ﬁfty percent of the P. koraiensis forests in the world are distributed in northeast China, a region with abundant germplasm resources. However, these natural P. koraiensis sources are in danger of genetic erosion caused by continuous climate changes, natural disturbances such as wildﬁre and frequent human activity. Little work has been conducted on the population genetic structure and genetic differentiation of P. koraiensis in China because of the lack of genetic information. In this study, 480 P. koraiensis individuals from 16 natural populations were sampled and genotyped. Fifteen polymorphic expressed sequence tag-simple sequence repeat (EST-SSR) markers were used to evaluate genetic diversity, population structure and differentiation in P. koraiensis . Analysis of molecular variance (AMOVA) of the EST-SSR marker data showed that 33% of the total genetic variation was among populations and 67% was within populations. A high level of genetic diversity was found across the P. koraiensis populations, and the highest levels of genetic diversity were found in HH, ZH, LS and TL populations. Moreover, pairwise Fst values revealed signiﬁcant genetic differentiation among populations (mean Fst = 0.177). According to the results of the STRUCTURE and Neighbor-joining (NJ) tree analyses and principal component analysis (PCA), the studied geographical populations cluster into two genetic clusters: cluster 1 from Xiaoxinganling Mountains and cluster 2 from Changbaishan Mountains. These results are consistent with the geographical distributions of the populations. The results provide new genetic information for future genome-wide association studies (GWAS), marker-assisted selection (MAS) and genomic selection (GS) in natural P. koraiensis breeding programs and can aid the development of conservation and management strategies for this valuable conifer species. This study investigated the genetic diversity and population structure of natural populations in northeast China, and proposed some conservation strategies for this valuable conifer species. This study is the ﬁrst comprehensive report of the genetic diversity of natural P. koraiensis populations in China. We found that the existing P. koraiensis populations in China maintain high levels of genetic diversity, which provide a foundation for germplasm innovation and genetic improvement of P. koraiensis . The population genetic analysis in this study identiﬁed two independent genetic units (Liangshui and Helong populations) that exhibit high degrees of genetic differentiation. The populations distributed in the Xiaoxinganling Mountains are highly genetically diverse and may represent the central population of natural P. koraiensis in China. Furthermore, the genetic structure of P. koraiensis populations identiﬁed in this study is consistent with the geographical distribution of these populations in China. These results have signiﬁcance for the protection of natural P. koraiensis germplasm resources in China as well as for developing improved genotypes through breeding. Our ﬁndings provide genetic information useful for future genome-wide association studies (GWAS) and marker-assisted selection (MAS) and genomic selection (GS) studies. It is, therefore, recommended to further conduct research on genetic improvement for timber and cone production using marker-assisted selection and/or genomic selection as well as genotype by environment interaction studies should be carried out to identify suitable site-speciﬁc genotypes.


Introduction
Pinus koraiensis (Sieb. et Zucc), commonly known as Korean pine, is a perennial evergreen tree in the Pinaceae family with five needles per fascicle [1][2][3]. It is an ancient and valued forest tree in East Asia, and natural forests of this species have undergone long-term succession and are described as tertiary forest [4]. Compared with other Pinus species, P. koraiensis is long-lived and is a dominant species in mixed conifer and broadleaved forest [5,6]. Currently, P. koraiensis is distributed mainly in cool-temperate regions in northeast China, the Russian Far East, the Korea peninsula (note that information is not available from North Korea due to limited access) and Honshu, Japan. It typically occurs in mild regions with more than 70% humidity and at altitudes from 600 m to 1500 m [7,8]. However, in China, it only grows from the Changbai Mountains to Xiaoxinganling Mountains in northeast China, mainly on slopes and rolling hills and in river valleys [9]. Nearly half of the germplasm resources of P. koraiensis in the world are found in Xiaoxinganling Mountains in Yichun city, China, where the largest and most undisturbed primeval forest remaining in Asia and a natural climax community of P. koraiensis exists [10].
P. koraiensis has high economic, ornamental and ecological values in East Asia. Timber of P. koraiensis is widely used for architecture, bridges, furniture and ships because of the light, soft, fine structure and straight texture of the wood and its strong corrosion resistance [11]. Furthermore, it produces edible nuts that are nutritious and distinctly flavored, containing abundant unsaturated fatty acids, vitamins and minerals [12]. It also has high medicinal value, able to lower cholesterol levels and allay ultraviolet injury and tiredness [13]. Natural P. koraiensis forest absorbs large amounts of carbon dioxide and contributes to climate change regulation [14]. Therefore, it is a prominent conifer tree species of great value for the maintenance and protection of the environment in East Asia.
Genetic improvement of P. koraiensis began in the 1960s, which then developed slowly due to a lack of systematic breeding strategies and objectives [8]. In the early stages of selective breeding, large numbers of superior trees or natural populations were selected from natural forest to establish primary seed orchards, mainly through phenotype selection [15][16][17]. Earlier studies have mainly focused on propagation technology [18], provenance division [19], progeny determination [20] and selection of improved varieties [21], while studies of molecular plant breeding, including studies of genetic diversity, genomic selection and construction of genetic maps, are lacking [22]. Existing natural forests of this species have great significance for the conservation of breeding materials, the development of gene resources and the study of population genetic diversity [23]. However, in the past few decades, with the increasing demand for wood and cones of P. koraiensis as well as increasing wildfire, the area of natural P. koraiensis forest has decreased extensively [24]. Thus, to protect existing natural forests under the background of illegal logging and unpredictable biotic stress, such as white pine blister rust diseases, the collection and evaluation of germplasm resources of P. koraiensis are urgently needed.
Genetic diversity and population structure are key parameters of population genetics research. Analyses of genetic variation among and within populations can guide the formulation of conservation strategies. The use of molecular markers identified from wholegenome, chloroplast genome and transcriptome analysis is a primary method of revealing genetic diversity and population structure. Many DNA molecular markers are codominant and highly polymorphic, and many have been identified in the genome and transcriptome, unlike morphological and biochemical markers [25][26][27]. Simple sequence repeats (SSRs) are considered powerful and advantageous molecular tools due to their low cost, easy detection by polymerase chain reaction (PCR), high polymorphism, and codominance. Thus, they can be used for genetic diversity analysis, genome-wide association analysis, core collections and genetic linkage map construction in many plants and animals [28][29][30]. Furthermore, multiple EST-SSR markers can easily be developed from microsatellite loci of public transcriptome data. At present, there are few reports of analyses of genetic diversity in P. koraiensis based on DNA molecular markers; studies to date have employed random amplified polymorphic DNA (RAPD) analysis [31], single primer amplification reaction (SPAR) [32], intersimple sequence repeat (ISSR) analysis [33,34] and expressed sequence tag-simple sequence repeat (EST-SSR) analysis [35]. All these studies have identified high levels of genetic diversity in P. koraiensis, with the greatest levels of genetic differentiation occurring within populations. However, those previous studies focused on a limited number of populations, few molecular markers and population size. Thus, a systematic and comprehensive population genetic study, involving widespread germplasm collection and abundant polymorphic markers developed from high-throughput sequencing, is necessary to study the genetic relationships and diversity of P. koraiensis populations.
In this study, germplasm resources from 480 individuals of 16 natural populations of P. koraiensis were collected within the species' main distribution area in northeastern China, and analyzed for genetic diversity using 15 EST-SSRs. This study is the first comprehensive study evaluating the genetic diversity and population structure of P. koraiensis in China using large samples and wider distribution as well as a sufficient number of molecular markers. The aims of the study were to (1) investigate genetic variation using polymorphic EST-SSRs, (2) evaluate the genetic diversity and structure of natural populations, (3) conduct a comprehensive, range-wide genetic diversity study of P. koraiensis in China, and (4) propose a protection conservation strategy. The hypothesis of the study was that high genetic diversity could be detected within populations and significant genetic differentiation could exist among populations due to restricted natural distribution of the species and low to moderate degree of gene flow between populations. Thus, the results will provide insights into the conservation of this species and lay a foundation for further studies of marker-assisted selection (MAS) and genomic selection (GS) in P. koraiensis for genetic improvement.

Genetic Diversity at Different Loci among Populations
The genetic diversity analysis was performed on 480 individuals from 16 natural P. koraiensis populations using 15 EST-SSRs markers ( Table 1). The allele size ranged from 151 bp at locus NEPK-65 to 301 bp at loci NEPK-168 and NEPK-184. In total, 155 alleles across all 15 loci were detected in the sampled individuals; the number of alleles per locus ranged from 4 (NEPK-67) to 21 (NEPK-145), with a mean value of 10.33. There were 58 private alleles, accounting for 37.42% of the alleles. The number of effective alleles (Ne) ranged from 1.170 at locus NEPK-40 to 6.605 at locus NEPK-145, with an average of 2.514 per locus. The observed (Ho) and expected (He) heterozygosity ranged from 0.008 to 0.984 and from 0.145 to 0.849, respectively, with mean values of 0.374 and 0.521, respectively. The polymorphic information content (PIC) varied from 0.142 (NEPK-40) to 0.833 (NEPK-145), with a mean value of 0.461. Four loci exhibited high polymorphism (PIC > 0.5) and 8 loci exhibited moderate polymorphism (0.2 < PIC < 0.5). In addition, across the 480 samples, all of the loci conformed to Hardy-Weinberg equilibrium. F-statistics were calculated to detect genetic subdivision and revealed moderate inbreeding and the mean value of Fst was 0.347, indicating moderate genetic variation. Regarding gene flow, the number of effective migrants (Nm) value ranged from 0.080 to 17.691 among populations, with an average of 2.667.

Genetic Diversity within Pinus koraiensis Populations
The levels of genetic diversity in the 16 populations are shown in Table 2. Across the sampled populations, the number of different alleles (Na) varied from 2.667 (HL) to 4.467 (TL), with a mean value of 3.271, and the number of effective alleles (Ne) ranged from 1.

Genetic Variation among Pinus koraiensis Populations
To evaluate the genetic variation among the collected samples, AMOVA was performed, and Fst among natural populations, genetic clusters and geographical regions were calculated; the results are shown in Table 3. The AMOVA results indicate that 67% of the total genetic variation existed within populations, indicating high genetic diversity within populations. AMOVA of the two genetic clusters identified by the STRUCTURE analysis indicated that 63.79% of the total variation was attributable to differences within populations, and the overall Fst was 0.362 (Fst > 0.25), indicating high genetic differentiation between the two clusters. In addition, the AMOVA of two groups classified according to geographical location indicated low genetic variation among populations within each group (2.77%). All of these results indicated high genetic differentiation within populations and groups. The Nei's genetic distance and pairwise Fst values are shown in Table 4. Fst was considered the main genetic parameter for evaluating genetic differentiation among populations. In this study, the pairwise Fst values ranged from 0.014 to 0.348, and most of the P. koraiensis population pairs exhibited high values (Fst > 0.15), indicating high levels of genetic diversity. The greatest level of differentiation was observed between populations Helong and Liangshui, and the lowest was observed between Jiaohe and Hulin. The highest genetic distance was observed between populations Helong and Liangshui (0.813), consistent with the pairwise Fst values and indicating pronounced differentiation between these two populations. The relative migration network among the 16 P. koraiensis populations was constructed using relative migration rate with the divMigrate function in R software. Analysis of gene flow between populations suggested a biased geographic distribution, and gene flow was not uniform among all populations ( Figure 1). A high degree of gene flow was observed among three populations located near one another (Muling, Maoershan and Fangzheng), consistent with the principal coordinate analysis and dendrogram analysis. In addition, one genetically isolated population (Boli) displayed high levels of gene flow with the three nearby populations Muling, Maoershan and Fangzheng. Moreover, a moderate level of gene flow was found among three admixed populations, and two genetically distinct populations (Zhanhe and Wangqing) exhibited distant segregation from the other populations. Table 4. Pairwise genetic differentiation index values (below the diagonal) and Nei's genetic distance values (above the diagonal). **** indicates the diagonal division of the pairwise genetic differentiation index values and Nei's genetic distance values.

Population Structure
The population structure analysis of the 16 natural P. koraiensis populations was performed based on a Bayesian approach using STRUCTURE software. The number of clusters within the range of 1 to 10 was evaluated for 10 repetitions in each run. In the structure plot (Figure 2), the maximum delta K value appeared at K = 2, with an obvious peak apparent at this value; this value was considered the optimal genetic cluster number for all EST-SSR markers ( Figure 2B,C). The 480 sampled individuals of P. koraiensis were divided into two genetic groups (Group 1 and Group 2) at K = 2: Group 1 comprised 149 individuals from 5 populations (Heihe, Liangshui, Zhanhe, Tieli and Hegang), and Group 2 comprised a higher number of individuals (331) from 11 populations (Liangzihe, Helong, Lushuihe, Linjiang, Jiaohe, Hulin, Boli, Muling, Maoershan, Fangzheng and Wangqing). Group 1 comprised almost all of the P. koraiensis plant materials from Xiaoxinganling Mountains, whereas Group 2 comprised almost all of the individuals from Changbaishan Mountains, suggesting a relationship between genetic structure and geographical distribution of the populations.

Population Structure
The population structure analysis of the 16 natural P. koraiensis populations was performed based on a Bayesian approach using STRUCTURE software. The number of clusters within the range of 1 to 10 was evaluated for 10 repetitions in each run. In the structure plot (Figure 2), the maximum delta K value appeared at K = 2, with an obvious peak apparent at this value; this value was considered the optimal genetic cluster number for all EST-SSR markers ( Figure 2B,C). The 480 sampled individuals of P. koraiensis were divided into two genetic groups (Group 1 and Group 2) at K = 2: Group 1 comprised 149 individuals from 5 populations (Heihe, Liangshui, Zhanhe, Tieli and Hegang), and Group 2 comprised a higher number of individuals (331) from 11 populations (Liangzihe, Helong, Lushuihe, Linjiang, Jiaohe, Hulin, Boli, Muling, Maoershan, Fangzheng and Wangqing). Group 1 comprised almost all of the P. koraiensis plant materials from Xiaoxinganling Mountains, whereas Group 2 comprised almost all of the individuals from Changbaishan Mountains, suggesting a relationship between genetic structure and geographical distribution of the populations.  To further analyze cluster patterns, principal component analysis (PCA) based on the pairwise genetic distance matrix of 15 EST-SSRs was performed; the results are shown in Figure 3. The 480 individuals from the 16 populations were roughly divided into two clusters according to the first two axes in the PCA plot. Principal axes 1 and 2 accounted for 22.99% and 12.46%, respectively, of the total genetic variation among the individuals, together accounting for 35.45% of the total genetic variation ( Figure 3A). Five populations (Heihe, Liangshui, Zhanhe, Tieli and Hegang) were grouped into cluster 1, and the remaining populations (Liangzihe, Helong, Lushuihe, Linjiang, Jiaohe, Hulin, Boli, Muling, Maoershan, Fangzheng and Wangqing) were grouped into cluster 2. The same clustering was obtained in the STRUCTURE analysis using the same dataset, indicating marked genetic differentiation. Furthermore, the Neighbor-joining (NJ) dendrogram based on Nei's genetic distance clustered the 480 P. koraiensis individuals from the 16 populations into 2 clusters, consistent with the above results ( Figures 3B and 4). maining populations (Liangzihe, Helong, Lushuihe, Linjiang, Jiaohe, Hulin, Boli, Muling, Maoershan, Fangzheng and Wangqing) were grouped into cluster 2. The same clustering was obtained in the STRUCTURE analysis using the same dataset, indicating marked genetic differentiation. Furthermore, the Neighbor-joining (NJ) dendrogram based on Nei's genetic distance clustered the 480 P. koraiensis individuals from the 16 populations into 2 clusters, consistent with the above results ( Figure 3B and Figure 4).

Correlations between Genetic Distance and Geographic Distance
The genetic distance estimated based on molecular markers may be related to the distribution of the species under study and the geographic distance between individuals or populations. In this study, the geographic distance and genetic distance values ranged from 37.72 km to 825.45 km and from 0.02 to 0.83, respectively. To investigate the correlations between genetic distance and geographic distance, the Mantel test was carried out. The results showed that genetic distance was not significantly correlated with the geographic distance among the P. koraiensis populations (p = 0.26, R 2 = 0.01), indicating a lack of association between geographical distance and the genetic differentiation of P. koraiensis maining populations (Liangzihe, Helong, Lushuihe, Linjiang, Jiaohe, Hulin, Boli, Mulin Maoershan, Fangzheng and Wangqing) were grouped into cluster 2. The same clusterin was obtained in the STRUCTURE analysis using the same dataset, indicating marked g netic differentiation. Furthermore, the Neighbor-joining (NJ) dendrogram based on Ne genetic distance clustered the 480 P. koraiensis individuals from the 16 populations into clusters, consistent with the above results ( Figure 3B and Figure 4).

Correlations between Genetic Distance and Geographic Distance
The genetic distance estimated based on molecular markers may be related to th distribution of the species under study and the geographic distance between individua or populations. In this study, the geographic distance and genetic distance values range from 37.72 km to 825.45 km and from 0.02 to 0.83, respectively. To investigate the correl tions between genetic distance and geographic distance, the Mantel test was carried ou The results showed that genetic distance was not significantly correlated with the ge graphic distance among the P. koraiensis populations (p = 0.26, R 2 = 0.01), indicating a la of association between geographical distance and the genetic differentiation of P. koraiens

Correlations between Genetic Distance and Geographic Distance
The genetic distance estimated based on molecular markers may be related to the distribution of the species under study and the geographic distance between individuals or populations. In this study, the geographic distance and genetic distance values ranged from 37.72 km to 825.45 km and from 0.02 to 0.83, respectively. To investigate the correlations between genetic distance and geographic distance, the Mantel test was carried out. The results showed that genetic distance was not significantly correlated with the geographic distance among the P. koraiensis populations (p = 0.26, R 2 = 0.01), indicating a lack of association between geographical distance and the genetic differentiation of P. koraiensis ( Figure 5). Liangzihe and Hegang populations exhibited the lowest geographic distance and were not grouped in the same cluster. Therefore, there was no obvious isolation by genetic and geographical distance among the sampled populations.
Horticulturae 2021, 7, x FOR PEER REVIEW 9 of 18 ( Figure 5). Liangzihe and Hegang populations exhibited the lowest geographic distance and were not grouped in the same cluster. Therefore, there was no obvious isolation by genetic and geographical distance among the sampled populations.

Discussion
To understand the genetic differentiation of forest tree populations and contribute to the development of effective breeding strategies, comprehensive evaluations of natural germplasm resources of individual species are essential; such evaluations can accelerate breeding strategies and industrial development [36,37]. Naturally, P. koraiensis mainly grows in the cold temperate zone, especially in northeast China, and natural forests of this species have been shown to be sensitive to climate factors. Thus, to conserve genetic resources of this species, it is important to obtain data on its genetic diversity and population structure. In present study, we conducted a population genetic analysis using codominant molecular markers, representing the first such analysis in P. koraiensis. The results can help guide the genetic improvement and resource conservation of this important gymnosperm.

Genetic Diversity
Genetic diversity has been increasingly evaluated in species lacking a reference genome, including some conifers [38], endemic species [39] and endangered plants [40]. Studies of genetic diversity can provide insight into speciation and genetic variation within and among populations and can aid the development of conservation strategies. However, transcriptome data and molecular markers remain lacking for P. koraiensis; the available genetic data provide few markers suitable for the study of population genetics in this species. Evaluating the germplasm resources of this species represents the first step towards understanding the genetics of natural P. koraiensis populations. A high level of genetic diversity in natural P. koraiensis populations was detected in this study, with mean values of 10.33 and 0.521 for Na and He, respectively. High genetic diversity was observed in the Heihe population in the northern Xiaoxinganling Mountains, possibly due to less human disturbance in this region than in other areas. According to a previous study, a PIC value equal to or more than 0.5 indicates high genetic information for genetic markers. In the present study, the PIC values obtained for the multiallelic EST-SSR markers ranged from 0.142 to 0.833, with a mean value of 0.461, indicating a high level of genetic information among the 480 P. koraiensis individuals from the 16 natural populations. The genetic diversity of P. koraiensis obtained in the present study is higher than that reported for Pinus bungeana (Na = 3.70, He = 0.36) [41], Pinus dabeshanensis (He = 0.36) [42] and Pinus

Discussion
To understand the genetic differentiation of forest tree populations and contribute to the development of effective breeding strategies, comprehensive evaluations of natural germplasm resources of individual species are essential; such evaluations can accelerate breeding strategies and industrial development [36,37]. Naturally, P. koraiensis mainly grows in the cold temperate zone, especially in northeast China, and natural forests of this species have been shown to be sensitive to climate factors. Thus, to conserve genetic resources of this species, it is important to obtain data on its genetic diversity and population structure. In present study, we conducted a population genetic analysis using codominant molecular markers, representing the first such analysis in P. koraiensis. The results can help guide the genetic improvement and resource conservation of this important gymnosperm.

Genetic Diversity
Genetic diversity has been increasingly evaluated in species lacking a reference genome, including some conifers [38], endemic species [39] and endangered plants [40]. Studies of genetic diversity can provide insight into speciation and genetic variation within and among populations and can aid the development of conservation strategies. However, transcriptome data and molecular markers remain lacking for P. koraiensis; the available genetic data provide few markers suitable for the study of population genetics in this species. Evaluating the germplasm resources of this species represents the first step towards understanding the genetics of natural P. koraiensis populations. A high level of genetic diversity in natural P. koraiensis populations was detected in this study, with mean values of 10.33 and 0.521 for Na and He, respectively. High genetic diversity was observed in the Heihe population in the northern Xiaoxinganling Mountains, possibly due to less human disturbance in this region than in other areas. According to a previous study, a PIC value equal to or more than 0.5 indicates high genetic information for genetic markers. In the present study, the PIC values obtained for the multiallelic EST-SSR markers ranged from 0.142 to 0.833, with a mean value of 0.461, indicating a high level of genetic information among the 480 P. koraiensis individuals from the 16 natural populations. The genetic diversity of P. koraiensis obtained in the present study is higher than that reported for Pinus bungeana (Na = 3.70, He = 0.36) [41], Pinus dabeshanensis (He = 0.36) [42] and Pinus yunnanensis (Na = 4.10, He = 0.43) [43] but lower than that reported for Pinus tabulaeformis (Na = 6.52, He = 0.68) [44]. The genetic diversity of a species may vary with characteristics such as adaptability, pollination mechanism and population size [45][46][47]. The observed genetic diversity in the present study might be attributable to the genetic background, life history and population dynamics of P. koraiensis. Previous studies have found that P. koraiensis has a large population size, long life cycle, strong adaptability, long pollination distance and large genome size, and it has a complex genetic background, which allows it to generate high genetic diversity [48][49][50][51][52]. Natural selection under the changing environmental conditions is likely to lead to differences in genotype frequency among populations. In addition, previous studies have suggested that evaluations of genetic diversity are limited by low numbers of populations and molecular markers [53,54]. For instance, analyses of the genetic diversity of natural P. koraiensis populations have been conducted using a variety of molecular marker techniques. Kim et al. [53] analyzed allozyme loci variation and found a moderate level of genetic diversity among natural P. koraiensis populations in Korea. The genetic diversity of natural P. koraiensis populations detected by allozyme markers (He = 0.18) in Russia [54] was much lower than that identified using EST-SSRs markers in this study. These findings indicate that P. koraiensis maintains high genetic diversity worldwide. The level of genetic diversity detected in this study is similar to that detected based on nine EST-SSR markers in seven natural populations of P. koraiensis in northeast China (He = 0.610) [35]. Xiaoxinganling Mountain of China was considered as the distribution center for P. koraiensis, possessing abundant germplasm resources and ancient founding stocks and maintaining considerable numbers of individuals. In addition, the genetic diversity of P. koraiensis populations from Xiaoxinganling Mountains was higher than that of the Changbaishan Mountains populations, with high expected heterozygosity and abundant private alleles found for the former populations ( Figure 4). All these results indicate that Xiaoxinganling Mountains may be the center of genetic diversity of P. koraiensis in China.

Population Genetic Differentiation
Detection of genetic differentiation is a key process in the genetic improvement of forest trees. Regarding the estimation of genetic differentiation, past studies have considered an Fst value higher than 0.15 but lower than 0.25 to indicate significant divergence [55][56][57]. In the present study, the genetic differentiation assessed by EST-SSRs among P. koraiensis populations ranged from 0.014 to 0.348, with a mean value of 0.177, indicating significant differentiation among populations in China. However, previous studies reported low genetic differentiation among populations as assessed by allozyme loci variation in Korea (Fst = 0.06) [53] and Russia (Fst = 0.015) [54] and by EST-SSRs in China (Fst = 0.02) [35]. In addition, Kim et al. [31] studied the genetic variation of P. koraiensis in Korea, Russia and China using allozymes and RAPDs and detected small differences among the three regions. Different degrees of genetic differentiation were observed in natural P. koraiensis populations in these countries, with low Fst values. The main reason for these differences is that only limited numbers of natural populations and molecular markers were analyzed. The genetic differentiation index (Fst) is correlated with gene flow (Nm). Generally, the greater the degree of differentiation, the weaker the gene flow, i.e., a lower gene migration rate among populations [57][58][59][60]. Natural P. koraiensis forests originated in Siberia and in northeast Asia have undergone regeneration, succession and migration over millions of years [61,62]. After the Quaternary glaciation, many species died out, but the P. koraiensis forests persisted into the present and underwent a range of changes and varying degrees of differentiation. In natural P. koraiensis populations, low levels of genetic differentiation have been observed in Korea [31], whereas high genetic differentiation has occurred in northeast China, which may have contributed to the rich P. koraiensis germplasm resources (representing approximately 60% of the world's total) and broad distribution area (more than 3000 hectares) in this country. The mean He (0.521) across all loci was greater than Ho (0.374), indicating a high heterozygosity among the sampled populations of P. koraiensis. This high heterozygosity is attributable to the fact that Pinus species exhibit cross-pollination and wind pollination. Furthermore, the AMOVA suggested that most of the genetic variation (more than 60%) in P. koraiensis exists within populations, with a small proportion occurring among populations; these findings are consistent with findings in other cross-pollinating species.

Population Structure and Gene Flow
Analyses of population structure can provide insight into population size, breeding system, extent of isolation and population migration or gene flow [63,64]. Furthermore, such analyses can help reveal the relationships between genetic variation and environmental stresses and enhance our understanding of evolution. Evaluating population structure is a key component of genome-wide association analysis (GWAS) and marker-assisted selection (MAS) [65]. P. koraiensis is mainly distributed in Xiaoxinganling and Changbaishan Mountains in northeast China, areas with a humid climate. Due to the environmental conditions, the germplasm resources of P. koraiensis from different locations display high phenotypic and genetic variation. The STRUCTURE analysis of population structure identified two groups (optimal K = 2) from the 16 natural populations, with 5 populations in one group and the remainder in another group. Similar results were obtained in the PCA and dendrogram (neighbor-joining tree) analysis, indicating genetic differentiation of P. koraiensis populations in China. Interestingly, individuals from Xiaoxinganling Mountains were clustered into one group, occupying a northern area, which makes them more like an ancestral group. Furthermore, the samples from Changbaishan Mountains and the adjacent ridge region were clustered into the other group; the populations corresponding to these samples are distributed in a southern area and exhibit different degrees of genetic differentiation and gene flow. However, some of the individuals from Xiaoxinganling Mountains were clustered into cluster 2, although the majority were clustered into cluster 1 ( Figure 3A,B). The main potential reasons for this finding are as follows: (1) These two mountain regions are close to each other, and some hybridization events may occur; (2) for populations separated by a short spatial distance, the probability of gene flow is high, which will affect population genetic structure; and (3) pollen and seed dispersal occur over long distances in this species, which promote gene flow. The genetic structure of the natural P. koraiensis populations in China determined in this study is consistent with the current geographical distribution of these populations. Furthermore, the findings are consistent with previous studies showing that populations in similar geographical locations or environments tend to cluster into the same group [66,67]. In this study, although some admixture was detected by the STRUCTURE and PCA analyses, the population dendrogram also suggested two subgroups, with 5 populations in cluster 1 and 11 populations in cluster 2.
Gene flow among populations is closely related to geographical distance and effective population size and can generate new genetic combinations, potentially enhancing species resilience and persistence [68][69][70]. In plants, migration, or gene flow, is achieved via seeds, pollen and other propagules, and influences the genetic diversity and differentiation among independent evolutionary units [71,72]. We found that two genetically distinct populations (Zhanhe and Wangqing) exhibited segregation from other populations, which may be related to their geographic distance from other populations (approximately 565 km), limiting the level of gene flow between them. These independent units play an important role in maintaining the genetic diversity of this species. This interpretation is consistent with previous studies demonstrating that isolated populations of plants with long-distance pollination may have higher levels of genetic diversity than large contiguous populations [73][74][75]. Moreover, high levels of gene flow were found among Helong, Maoershan and Fangzheng populations. Thus, it can reduce the effects of artificial selection or genetic drift and promote the maintenance of genetic information. Similar results were obtained for Camelina sativa accessions [65]. Previous studies have also found that extensive gene flow can alter the gene frequencies in populations to affect genetic diversity and structure. In our study, although a strong correlation between gene flow and geographic distance between populations was observed, some degree of gene flow was also evident between geographically distant populations. In addition, geographic distance was not correlated with genetic distance in the natural P. koraiensis populations in this study, suggesting that geographic distribution may not be a determinant factor for the genetic structure of populations.

Conservation and Management Strategies
Evaluations of germplasm resources are needed to maintain abundant genetic variation and high levels of genetic diversity of some species of interest and establish sound conservation strategies. Our population genetic analysis revealed that the populations distributed in the Xiaoxinganling Mountains (Zhanhe, Heihe, Liangshui, Tieli and Hegang) exhibit high levels of genetic diversity and moderate levels of gene flow (Figure 4). These populations represent the core populations and have stronger environmental adaptability and evolutionary potential than the other populations, and they can be considered independent genetic units. Hence, measures such as in situ conservation should be implemented for conserving natural P. koraiensis resources. In addition, the marginal populations represent special germplasm resources. They are characterized by low genetic diversity but have high levels of genetic differentiation relative to the other natural populations. Habitat fragmentation can reduce gene flow among populations, leading to a loss of genetic diversity. In this study, the Helong population, which occurs in a marginal area, should be targeted for conservation measures, such as ex situ measures. In addition, the greatest level of population differentiation was observed between Helong and Liangshui populations, indicating that these populations can be considered independent units. Therefore, regulations and management strategies must be established to protect the natural habitat of this species and prohibit harvest. More importantly, a national-level core germplasm resource library of P. koraiensis should be established by the government, with the objectives of maintaining genetic variation, improving plant adaptability to environmental changes and developing new breeding materials. Under these measures, the existing natural P. koraiensis populations in China can be protected and be better used as a source of resources for genetic improvement in the future.
For forest management perspectives, efforts should be made to regulate timber harvesting in these population in order to reduce loss of genetic diversity. Particularly, the marginal populations, such as the Helong population, that are characterized by low genetic diversity should be given the highest management priority by enrichment planting of individuals from different populations to enhance the genetic diversity within this population. Forest management should also focus on suppression of wildfire in these forests as population fragmentation driven by wildfire can reduce gene flow among populations, leading to a loss of genetic diversity. In addition, these forests should be protected from pest and disease, such as the white pine blister rust diseases that affect the trees.

Plant Materials and Genomic DNA Extraction
In this study, 16 populations of P. koraiensis from Jilin Province (J) and Heilongjiang Province (H) were considered. A total of 480 samples were collected from these populations, which occur throughout the natural distribution areas in northeastern China (Table 5, In addition, nucleic acids were extracted from needles using the improved cetyltrimethyl ammonium bromide (CTAB) method described by Li et al. [76]. DNA quality and concentration were evaluated using 1.0% agarose gel electrophoresis and the K5500 Plus microspectrophotometer (KAIAO Technology Development Co., Ltd., Beijing, China), respectively. In addition, nucleic acids were extracted from needles using the improved cetyltrimethyl ammonium bromide (CTAB) method described by Li et al. [76]. DNA quality and concentration were evaluated using 1.0% agarose gel electrophoresis and the K5500 Plus microspectrophotometer (KAIAO Technology Development Co., Ltd., Beijing, China), respectively.

PCR Amplification and SSR Analysis
Fifteen highly polymorphic and reproducible EST-SSR markers of P. koraiensis developed in our laboratory were selected in this study to detect polymorphisms in the 16 sampled P. koraiensis populations. The primers of P. koraiensis were developed as described by Li et at. [23]. Eight capillary electrophoresis templates were amplified with fifteen primers synthesized by Sangon Biotech (Shanghai, China), and universal M13 sequence (5 -TGTAAAACGACGGCCAGT-3 ) labeled with four fluorescent dyes (TAMRA, FAM, HEX and ROX) was added at the 5 end of the forward primers. DNA was diluted to a working concentration of 25 ng/µL. To detect SSR loci, polymerase chain reaction (PCR) was performed in a total volume of 20 µL containing 10 µL 2× Super PCR Mix (Beijing Genomics Institute Tech Solutions (Bejing Liuhe) Co., Ltd., Beijing, China), 2 µL template DNA, 0.8 µL forward primer (1 µM), 3.2 µL reverse primer (1 µM), 1 µL M13 primer with fluorescent label and 3 µL ddH 2 O. The PCR amplification conditions were as follows: 94 • C for 5 min followed by 30 cycles at 94 • C for 30 s, 57 • C for 30 s, and 72 • C for 30 s; followed by 8 cycles at 94 • C for 30 s, 55 • C for 30 s, and 72 • C for 30 s; followed by final extension at 72 • C for 10 min. The PCR products were subjected to 1.0% agarose gel electrophoresis and then analyzed by high performance capillary electrophoresis (HPCE) using an ABI 3730XL DNA Sequencer (Applied Biosystems, Foster City, CA, USA) to detect fragment size. The original sequence data were analyzed using GeneMapper (version 4.1) software.

Data Analysis
GeneMapper was used to obtain the microsatellite allele data, and the Microsatellite toolkit v 3.1.14 was used to convert the data into the necessary format for analysis. The genetic diversity analysis was conducted using GENALEX software version 6.50 [77] with the following parameters: number of alleles (Na), effective number of alleles (Ne), observed (Ho) and expected (He) heterozygosity, number of rare alleles (NRA), Shannon diversity index (I), Hardy-Weinberg equilibrium (HWE), F-statistics (Fis, Fit and Fst) and Nei's genetic distance. The TBtools software [78] was used to plot the heatmap of expected heterozygosity (He). In addition, we calculated the polymorphism information content (PIC) values of each SSR primer using the PICcalc program [79]. Gene flow (Nm) was calculated as Nm = (1 − Fst)/4 × Fst and used to measure the degree of gene exchange among or within the 16 populations. ALREQUIN software (version 3.5) [80] was used to analyze the level and sources of molecular genetic variation via AMOVA based on the evolutionary distances among and within the sampled populations and the observed genetic clusters. The total genetic variation was divided into three components: among groups, among populations within groups and within populations.
To evaluate the population genetic structure of P. koraiensis, a Bayesian clustering algorithm was performed in STRUCTURE software (version 2.3) [81] with the following settings: K-values from 1 to 10, with ten runs per K value and a burn-in period and number of Markov chain Monte Carlo (MCMC) reps after burn-in of 100,000 iterations and 100,000, respectively. The optimal K value for the number of populations was determined based on the delta-K values calculated by the Evanno method [82], using an algorithm of the online tool of STRUCTURE HARVESTER [83], where a clear peak was observed in the plot of delta K. In addition, principal component analysis (PCA) was performed to evaluate the genetic relationships among different populations using GENALEX software version 6.50. Based on the Nei's genetic distance (1983), a Neighbor-joining (NJ) phylogenetic tree of the populations was constructed using PowerMarker software (version 3.25) [84] and annotated and visualized using the online tool interactive Tree Of Life (iTOL) [85]. Geographic distance among populations was calculated as described in Li et al.'s study [76]. Finally, to detect the gene flow among the 16 populations, a relative migration network was constructed using the 'diveRsity' [86] package of R software (version 3.5.0) [87].

Conclusions
This study investigated the genetic diversity and population structure of natural P. koraiensis populations in northeast China, and proposed some conservation strategies for this valuable conifer species. This study is the first comprehensive report of the genetic diversity of natural P. koraiensis populations in China. We found that the existing P. koraiensis populations in China maintain high levels of genetic diversity, which provide a foundation for germplasm innovation and genetic improvement of P. koraiensis. The population genetic analysis in this study identified two independent genetic units (Liangshui and Helong populations) that exhibit high degrees of genetic differentiation. The populations distributed in the Xiaoxinganling Mountains are highly genetically diverse and may represent the central population of natural P. koraiensis in China. Furthermore, the genetic structure of P. koraiensis populations identified in this study is consistent with the geographical distribution of these populations in China. These results have significance for the protection of natural P. koraiensis germplasm resources in China as well as for developing improved genotypes through breeding. Our findings provide genetic information useful for future genome-wide association studies (GWAS) and marker-assisted selection (MAS) and genomic selection (GS) studies. It is, therefore, recommended to further conduct research on genetic improvement for timber and cone production using marker-assisted selection and/or genomic selection as well as genotype by environment interaction studies should be carried out to identify suitable site-specific genotypes.
Author Contributions: Conceptualization, X.Z. and X.L.; methodology, X.L., Y.L. and M.Z.; validation, X.L., Y.X. and M.T.; resources, X.L.; writing-original draft preparation, X.L.; writing-review and editing, X.L., M.T., and X.Z.; supervision, X.Z.; project administration, X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: Data for this study can be made available with reasonable request to the authors.

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.