Gene Introgression among Closely Related Species in Sympatric Populations: A Case Study of Three Walnut ( Juglans ) Species

: Gene introgression usually results from natural hybridization occurring among closely related species in sympatric populations. In this study, we discussed two rare and frequent gene ﬂow phenomena between three species of Juglans plants and analyzed the possible causes for the di ﬀ erence. We collected 656 individuals from 40 populations of Persian walnut ( Juglans regia L.), Chinese walnut ( J. cathayensis Dode), and Iron walnut ( J. sigillata Dode) that were genotyped at 17 expressed sequence tag simple sequence repeat (EST-SSR) loci to analyze the introgressions between J. regia and J. cathayensis and J. regia and J. sigillata. Our study compared the spatial patterns of expected heterozygosity ( H E ), allelic richness ( Rs ), and private allele richness ( P AR ) so as to vividly infer the biogeographic history of related species of Juglans in the two regions. The results of the PCoA, UPGMA, and STRUCTURE analyses showed that all J. regia and J. sigillata populations clustered into one group, and the J. cathayensis populations clustered into the other group. The results of the historical gene ﬂow analysis indicated that J. regia and J. sigillata have no genetic barriers, and the directional gene ﬂow is mainly from J. regia to J. sigillata. For the three species of Juglans , all the above results indicated that gene ﬂow was common among the same group of Juglans , and only rare and low-level gene ﬂow appeared in distinct groups. Therefore, our study revealed multiple phenomena of gene ﬂow and introgression among closely related species in sympatric populations, thereby providing a theoretical basis for the genetic evolution of the genus Juglans


Introduction
Interspecies introgression can cause base substitution and deletion of DNA fragments, leading to a decrease in species specificity, which is likely to promote adaptive evolution of the infiltrated species [1][2][3]. Interspecific hybridization occurs frequently in sympatric forest trees. Morphology may provide some useful information to identify hybrids, but molecular markers are more reliable at identifying hybrids. Nuclear DNA sequences can be used to differentiate among closely related lineages, to clarify cases of sympatric speciation, and to identify hybrid individuals [4][5][6][7]. Studies have shown that hybridization and gene flow between species occur extensively in the genus Populus [8][9][10][11][12]. One of these studies showed that Populus × jrtyschensis is typical of F1 dominated hybrid zones between the distantly related species, P. nigra and P. laurifolia [12]. In the genus Eriobotrya Lindl., compelling evidence indicated that most haplotypes of E. prinoides var. Daduheensis were also shared with those of E. japonica and E. prinoides to produce a hybrid status for E. prinoides var. Daduheensis [13]. However,

Microsatellite Data Analysis
The evaluation of the genetic diversity for each locus and population was based on the following descriptive summary statistics: the number of alleles (A), the number of effective alleles (Ne), the observed (H O ) and expected (H E ) heterozygosity, the unbiased expected heterozygosity (UH E ), the polymorphic information content (PIC), the within-population inbreeding coefficient (F IS ), the total population inbreeding coefficient (F IT ), and the among-population genetic differentiation coefficient (F ST ) using GenAlEx6.5 [45]. Allelic richness (Rs) and private allele richness (P AR ) were calculated by rarefying to the minimum sample size across populations [46] (Tables S1 and S2). Subsequently, the expected heterozygosity (H E ), the allelic richness (Rs), and the private allele richness (P AR ) of the 40 populations sampled were used to show the geographic pattern by using the inverse distance weighted (IDW) interpolation function implemented in the Geographic Information System software ArcGIS 10.0 (ESRI, Redlands, CA, USA). The program, Arlequin version 3.5 [47], was used to analyze the molecular variance (AMOVA) of each species, among populations within species, and within populations among individuals. We calculated FST to identify the outlier loci potentially under selection [48]. Loci under selection pressure can bias the population's genetic analysis because most methods for evaluating genetic structure assume neutrality [49,50] (p ≤ 0.05; Figure S2). GENEPOP Version 4.2 [51] was used to test all loci of the Hardy-Weinberg equilibrium (HWE). Markers were also tested for the linkage disequilibrium (LD) at each population, and the locus was tested using FSTAT [52]. Significance was tested using 1000 permutations. CERVUS version 3.0 [53] was used to calculate the polymorphic information content (PIC). Each locus was evaluated for the possible presence of a null allele using MICRO-CHECKER 2.2.3 [54]. (UPGMA) tree based on Nei's genetic distance [58] was utilized using POPTREE2 [59] and visualized with the software Fig Tree 1.4.2 [60]. MIGRATE 3.6.4 [61] was used to estimate the historical gene flow parameter (M) and the frequency of migration events through the coalescent history between sympatric J. regia and J. sigillata populations in Yunnan Province and J. regia and J. cathayensis populations in the Qinling Mountains for the EST-SSRs. The mode and 95% highest posterior density were then estimated after checking for data convergence.  Genetic structure analysis was implemented in STRUCTURE version 2.3.4 using a Bayesian clustering approach [55]. A total of nine independent simulations were run for each K (1-10), with 50,000 burn-in steps followed by 100,000 MCMC steps, using the LOCPRIOR model (admixture model with sample locations as prior information) with dependent allele frequencies. The program STRUCTURE HARVESTER was used to calculate the optimal value of K using the delta K criterion [56], while the inferred clusters were drawn as colored box plots using the program DISTRUCT [57]. GenAlEx 6.5 was used for principal coordinate analysis (PCoA) to explore the overall genetic variation of the EST-SSR data. The unweighted pair-group method with an arithmetic means (UPGMA) tree based on Nei's genetic distance [58] was utilized using POPTREE2 [59] and visualized with the software Fig Tree 1.4.2 [60]. MIGRATE 3.6.4 [61] was used to estimate the historical gene flow parameter (M) and the frequency of migration events through the coalescent history between sympatric J. regia and J. sigillata populations in Yunnan Province and J. regia and J. cathayensis populations in the Qinling Mountains for the EST-SSRs. The mode and 95% highest posterior density were then estimated after checking for data convergence.

Nuclear Microsatellite Diversity
Genetic diversity estimates varied among the microsatellite loci (Table S1) and among the population ( Table 1). The allelic diversity across the loci was highly polymorphic and is summarized in Table S1. Most loci showed a significant departure from the Hardy-Weinberg equilibrium (HWE) test, which was performed for all individuals using Genepop with 1000 permutations. A total of 55 alleles were detected over the 17 loci among the 656 sampled individuals, with the total number of alleles (A) per locus ranging from 2 to 7 and having a mean of 3.22 (Table S1). Locus JR1817 had the largest number of alleles (A = 7). A private allele was identified at a high frequency (0.19) at the GS-r sample site. The observed (Ho) and expected (H E ) heterozygosity varied from 0.117 to 0.668 and from 0.135 to 0.671, with averages of 0.386 and 0.396, respectively (Table S1). The within-population inbreeding coefficient (F IS ) was significantly negative for JC8125, JH89978, JM5969, JR4616, JH42753, JH86514, JH6044, and JH2096, and was significantly positive for the remaining nine loci. An F IS value significantly greater than zero indicated heterozygote deficiency, which may be the result of the inbreeding extent of the larger population. The inbreeding coefficient was determined (F IT ) at each locus, ranging from 0.107 to 0.784, with an average of 0.441. The genetic differentiation (F ST ) ranged from 0.188 to 0.77, with a mean of 0.429, indicating strong genetic differentiation among the populations (Table S1). Eight of the seventeen loci were detected to be F ST outliers: JC3412, JM5969, JC7329, JR4964, JR4616, JM78331, JR6160, and JR1817 ( Figure S2).

Genetic Diversity within and among Juglans Population
Genetic diversity estimates for each population based on the 17 EST-SSRs are reported in Table 1. The values of the A, Ne, Ho, H E , UH E , Rs, and P AR of each population ranged from 1.176 to 5.706, 1.118 to 3.21, 0.165 to 0.566, 0.092 to 0.611, 0.097 to 0.698, 1.960 to 4.260, and 0.000 to 0.160, respectively (Table 1) (Table 1). For J. sigillata, higher values of H E , Rs, and P AR were observed in northern Yunnan, while some of the populations from south Yunnan province had the lowest numbers (Figure 3b,d,f). For J. regia, a high value of H E was found in northern Yunnan (Figure 3a). Interestingly, the result of Rs in J. regia populations from Yunnan Province was the opposite of the P AR result (Figure 3c,e). AMOVA showed that the percentage of variation and the coefficient of genetic differentiation at these loci (F ST ), between J. regia and J. sigillata, J. regia, and J. cathayensis, was 26.53% (p < 0.001), 33.77% (p < 0.001), and 0.406 (p < 0.001) and 0.412 (p < 0.001), demonstrating that the difference between J. regia and J. cathayensis was significantly higher than that of J. regia and J. sigillata ( Table 2). The AMOVA results of J. regia, J. sigillata, and J. cathayensis species showed that the proportion of variance mainly existed within populations (J. regia 79%, p < 0.001; J. sigillata 88%, p < 0.001; J. cathayensis 88%, p < 0.001) ( Table 2). The Migraten analysis produced θ and M values greater than zero. The θ-value and the size of the immigration rate (M) revealed a highly asymmetric historical gene flow among the three species. The results of the historical gene flow analysis indicate that J. regia and J. sigillata have no genetic barriers, and the directional gene flow is mainly from J. regia to J. sigillata (4.63 versus 2.19) ( Table 3). J. regia and J. cathayensis populations in the Qinling Mountains showed lower historical gene flow (0.85 versus 0.93) ( Table 3).

Population Structure of J. regia, J. cathayensis and J. sigillata
According to the whole-range STRUCTURE analysis using three datasets (based on all three walnut species, only sympatric J. regia versus J. cathayensis populations and J. regia versus J. sigillata populations), the values of the log-likelihood of the data, loge Pr (K), increased progressively with an increase in K, but delta K indicated that the optimal values for K were all 2 [55,62] (Figure S3a-c). The proportions of each individual in each population that were assigned into two clusters (clusters I and II) are shown in Figure 4. When all samples (from all three walnut species) were included in the STRUCTURE analysis (which was based on the EST-SSR data), the highest mean posterior probability value was detected for K = 2 (Figure 4b). At K = 2, all J. regia and J. sigillata populations were clustered into one group, and the J. cathayensis populations into the other. According to Nei's genetic distances, a phylogenetic tree was established by the unweighted pair-group method with arithmetic means    Table 1. IDW interpolation of (a,c,e) expected heterozygosity (He), allelic richness (Rs), and private allele richness (P AR ) values of J. regia; (b,d,f) expected heterozygosity (He), allelic richness (Rs), and private allele richness (P AR ) values of J. sigillata.

Population Structure of J. regia, J. cathayensis and J. sigillata
According to the whole-range STRUCTURE analysis using three datasets (based on all three walnut species, only sympatric J. regia versus J. cathayensis populations and J. regia versus J. sigillata populations), the values of the log-likelihood of the data, loge Pr (K), increased progressively with an increase in K, but delta K indicated that the optimal values for K were all 2 [55,62] (Figure S3a-c). The proportions of each individual in each population that were assigned into two clusters (clusters I and II) are shown in Figure 4. When all samples (from all three walnut species) were included in the STRUCTURE analysis (which was based on the EST-SSR data), the highest mean posterior probability value was detected for K = 2 (Figure 4b). At K = 2, all J. regia and J. sigillata populations were clustered into one group, and the J. cathayensis populations into the other. According to Nei's genetic distances, a phylogenetic tree was established by the unweighted pair-group method with arithmetic means (UPGMA) methods (Figure 4c). Figure 3c shows that three Juglans species were clustered into two branches. Cluster I included all J. regia and J. sigillata populations, in which the J. sigillata populations were classified as J. regia populations; the other consisted of J. cathayensis populations. In addition, the PCoA (Figure 4e), based on genetic distance, showed a similar genetic structure to that detected by the STRUCTURE analysis (Figure 4b). For three species of Juglans, all the above results indicate that gene introgression is common among the same groups of Juglans, as only rare and low-level gene introgression appeared in distinct groups.
For sixteen pairs with sympatric distributions of J. regia and J. cathayensis from all of sampling regions, the ∆K values [62] computed for all K classes indicated a strong signal for K = 2 (Figure 4a and Figure S4b). The Bayesian clustering approach revealed that cluster I was the J. regia  (Figure 5b). The negative correlation between pollen dispersal distance and population density has been reported in many wind-driven forest species [63,64]. Therefore, the high-density distribution of Juglans may, to some extent, limit the pollen dispersion distance. Subsequently, we used the four pairs of sympatric J. regia and J. cathayensis populations (1-QL-r versus 25-BY-c; 5-NS-r versus 29-NS-c; 8-LT-r versus 32-LT-c; 12-SL-r versus 36-SL-c), with relatively greater gene flow, to carry out the structure analysis, and the optimal K value was 2 (Figure 5d-g). The overall proportions of hybrids estimated by the four results of the STRUCTURE analysis were 4.08% (2/49), 5.41% (2/37), and 6.66% (1/15), respectively. In summary, only rare and low-level gene introgression occurred between sympatric J. regia and J. cathayensis populations.  (3.98-4.39) 0.85 (0.74-0.98) θ is four times the effective population size multiplied by mutation rate per site per generation; M is the immigration rate divided by the mutation rate.  (Figure 5b). The negative correlation between pollen dispersal distance and population density has been reported in many wind-driven forest species [63,64]. Therefore, the high-density distribution of Juglans may, to some extent, limit the pollen dispersion distance. Subsequently, we used the four pairs of sympatric J. regia and J. cathayensis populations (1-QL-r versus 25-BY-c; 5-NS-r versus 29-NS-c; 8-LT-r versus 32-LT-c; 12-SL-r versus 36-SL-c), with relatively greater gene flow, to carry out the structure analysis, and the optimal K value was 2 (Figure 5d-g). The overall proportions of hybrids estimated by the four results  of the STRUCTURE analysis were 4.08% (2/49), 5.41% (2/37), and 6.66% (1/15), respectively. In summary, only rare and low-level gene introgression occurred between sympatric J. regia and J. cathayensis populations. The genetic structures of the four pairs of J. regia and J. sigillata populations with sympatric distribution from Yunnan province were evaluated by Bayesian cluster analysis using STRUCTURE. The log-likelihood values reached three peaks when K > 2, and the values of delta K were highest when K was 2 [55,62] (Figure S3c). At K = 2, cluster I (west group) included four J. regia populations found in Yunnan province, and cluster II (east group) was comprised of four J. sigillata populations from Yunnan province (Figure 5h,i). Two populations (17-QZ-Y-r, 20-BS-Y-r) between the west and the east cluster were identified as admixed. Consistently, the number of subclusters for each larger K was much lower than the number of sampled populations. The results of the STRUCTURE analysis inevitably indicate extensive gene flow, either historical or ongoing (Figure 5i).  Landscape genetics can be regarded as a self-explanatory tool, which can transmit information about the spatial distribution of genetic variation into practice [65]. However, few scholars have combined landscape genetics and genetic diversity studies to analyze the biogeographical history of The genetic structures of the four pairs of J. regia and J. sigillata populations with sympatric distribution from Yunnan province were evaluated by Bayesian cluster analysis using STRUCTURE. The log-likelihood values reached three peaks when K > 2, and the values of delta K were highest when K was 2 [55,62] (Figure S3c). At K = 2, cluster I (west group) included four J. regia populations found in Yunnan province, and cluster II (east group) was comprised of four J. sigillata populations from Yunnan province (Figure 5h,i). Two populations (17-QZ-Y-r, 20-BS-Y-r) between the west and the east cluster were identified as admixed. Consistently, the number of subclusters for each larger K was much lower than the number of sampled populations. The results of the STRUCTURE analysis inevitably indicate extensive gene flow, either historical or ongoing (Figure 5i).

Discussion
4.1. Genetic Diversity and the Spatial Genetic Structure of J. regia, J. sigillata, and J. cathayensis Landscape genetics can be regarded as a self-explanatory tool, which can transmit information about the spatial distribution of genetic variation into practice [65]. However, few scholars have combined landscape genetics and genetic diversity studies to analyze the biogeographical history of a species [65,66]. Previous studies have suggested that Rs and H E are very useful indicators to measure genetic diversity and are considered ideal indicators to determine conservation priorities [67]. Here, we compare the spatial manifestations of He, Rs, and P AR , so as to vividly infer the biogeographic history of walnut-related species in the two regions (Figures 2 and 3). Interestingly, the high He and Rs values of J. regia populations were observed in the eastern and western of Qinling Mountains, whereas the low values of P AR appeared in the same regions. This may be due to the fact that J. regia populations, as important nut trees in the eastern and western regions of the Qinling Mountains, experienced more frequent artificial disturbance, which increased genetic diversity in these areas and erased ancient signs of refugia. Unlike the J. cathayensis populations in Qinling Mountains, J. regia and the J. sigillata populations in Yunnan are an economic species, and domesticated species tend to become more scattered and exhibit relatively lower abundance due to human selection and human-mediated dispersal of selected genotypes, which also explains why the genetic diversity of J. cathayensis populations is higher than that of J. regia and J. sigillata populations. The same reason also explains the opposite distribution of the Rs and P AR values of J. regia populations in Yunnan.
This study is the first to explore the interspecific introgression of Chinese Juglans plants and provides a theoretical basis for the genetic evolution of the genus Juglans. In this study, we estimated the genetic diversity and genetic structure of 40 walnut populations sampled across its sympatric range using 17 microsatellite markers (nine neutral EST-SSRs and eight non-neutral EST-SSRs, Figure S2). For our analyses, eight of the seventeen loci were detected to be F ST outliers: JC3412, JM5969, JC7329, JR4964, JR4616, JM78331, JR6160, and JR1817 ( Figure S2). The microsatellite loci screened in this study may be located within functional genes. All loci were annotated using Pfam, the GO database, and blastx of NCBI [68]. EST-SSR stands for expressed sequence tags, in coding regions and potentially under selection [43,68]. The STRUCTURE results based on the neutral loci and selected loci were not significantly and consistently the same among the three Juglans species (Figure 3). The structure analysis of the two groups of Juglans based on the EST-SSR data indicated that J. regia and J. cathayensis have strong reproductive isolation and rarely exhibit gene introgression phenomena, while J. regia and J. sigillata have no obvious genetic boundaries and frequently exhibit gene introgression phenomena (Figures 4 and 5). These findings are consistent with those of of the UPGMA tree (Figure 4c), PCoA (Figure 4e), and the historical gene flow analysis ( Table 3). Because of the limitation of pollen diffusion, we used the four pairs of sympatric J.regia versus J.cathayensis populations (1-QL-r versus 25-BY-c; 5-NS-r versus 29-NS-c; 8-LT-r versus 32-LT-c; 12-SL-r versus 36-SL-c) with relatively more gene flow to carry out structure analysis (Figure 5d-g). In summary, strong reproductive isolation and low-level gene introgression occurred between sympatric J. regia and J. cathayensis populations. Our study provides an additional study on the three species of Juglans to provide theoretical support for the genetic conservation of intergeneric plants. The experimental objects of this study are important economic tree species, which have experienced too much human interference. To further study the gene flow between plant genera, more forest tree species need to be studied.

Species Classification of J. regia and J. sigillata
A species in the center of its habitat may present some minor adaptive changes from time to time, but they are only varieties and retain a certain geographical distribution. Previous studies on the morphology, molecular biology, phenological, and fossils of Juglans have shown that there are five species of Juglans in China, which are classified into two groups: section Juglans and section Cardiocaryon. Section Juglans includes the two cultivated walnuts, J. regia and J. sigillata. However, there is a long-standing controversy about whether J. sigillata is a variety or an independent species. Although there are obvious morphological differences between J. regia and J. sigillata, many studies have indicated that these two species have no genetic barriers. Structural analysis and gene flow analysis based on four pairs of sympatric populations in Yunnan showed that J. regia and J. sigillata have obvious gene flow, and directional gene flow has been mainly from J. regia to J. sigillata. Combined with previous research results, we prefer to support the view that J. sigillata is a sub-species or, perhaps, a landrace of J. regia. Zhao et al. [29] and Yuan et al. [43] have also provided powerful evidence for the establishment of this hypothesis through other molecular techniques. However, some studies have shown that, although there is a high level of gene flow between oak (Quercus) and pine (Pinus) tree species [69][70][71], they both still maintain the integrity of their species, which also reflects the complex phenomenon of gene introgression and hybridization between species.

Influencing Factors of Distinct Gene Introgression among Three Juglans Species
Hybridization may have played an important role in the genetic differentiation of Juglans, and the phenomenon of interspecific hybridization has been strongly confirmed [36][37][38][39][40][41][42][43]. Accordingly, we selected two groups of walnut species to study. The two cultivated walnuts of section Juglans (J. regia and J. sigillata) were from Yunnan province, while J. regia of section Juglans and J. cathayensis of section Cardiocaryon were from the Qingling Mountains. It is noteworthy that the natural coexistence of J. regia and J. sigillata is very narrow. The results of the STRUCTURE analysis (Figures 4 and 5), the UPGMA tree (Figure 4c), PCoA (Figure 4e), and the historical gene flow analysis (Table 3) indicate that gene introgression is common among the same groups of Juglans, as only rare and low-level gene introgression appears in distinct groups.
In plants, interspecific gene introgression often occurs due to the hybridization of closely-related species distributed in the same region. The analysis results of the genetic diversity and genetic structure of J. regia and J. sigillata showed that J. regia and J. sigillata have frequent gene flow, which might be due to the dispersal of pollen or seeds to achieve gene introgression among populations, further reducing the genetic divergence of different species. The persion walnut and the iron walnut, which are important economic tree species in China, have been seriously disturbed by human activities. This disturbance may also interfere with the two-way direction of gene introgression between the two closely-related species. However, compared with the frequent gene flow between two closely related species (J. regia and J. sigillata), the STRUCTURE results showed that there was a strong reproductive isolation mechanism between the distant species, J. regia and J. cathayensis, in different groups, and hybridization is a rare event. We speculated that the possible reasons for this rarity are related to the reproductive isolation mechanism. A reproductive isolation mechanism among some species is sufficient to maintain the integrity of the species, but relevant studies have shown that multiple isolation mechanisms need to work together among two related species, as an isolation mechanism can almost completely block the gene flow between the two species [72]. However, even if the probability of hybridization is very low, it is normal to occasionally find individual hybridization events. Natural hybridization and gene introgression are ubiquitous among and within genera of plants, thereby providing the main impetus for plant evolution. However, the overwhelming majority of species retain species stability, even at a high levels of gene flow. This also indicates that there may be complex mechanisms for maintaining stability among species, or, possibly, genes controlling species' divergence may not change due to gene flow among species, thereby maintaining the independence of the species.

Conclusions
In this study, 656 individuals from 40 Persian walnut, Chinese walnut, and iron walnut populations were genotyped with 17 EST-SSRs to analyze the gene introgression among J. regia and J. cathayensis, J. regia, and J. sigillata. In summary, J. regia and J. cathayensis showed strong reproductive isolation and rare gene introgression phenomena, and J. regia and J. sigillata had no obvious genetic boundaries and frequent gene introgression phenomena. For the three species of Juglans, all of the above results indicate that gene introgression is common among species in the same group of Juglans, and only rare and low-level gene introgression appears in distinct groups. Our study provides information on three species of Juglans to provide theoretical support for the genetic conservation of intergeneric plants.
The experimental object of this study was to analyze important economic tree species which have experienced too much human interference. To further study the gene flow between plant genera, more forest tree species need to be studied.