Impact of Gene Flow and Introgression on the Range Wide Genetic Structure of Quercus robur (L.) in Europe

As for most other temperate broadleaved tree species, large-scale genetic inventories of pedunculate oak (Quercus robur L.) have focused on the plastidial genome, which showed the impact of post-glacial recolonization and manmade seed transfer. However, how have pollen mediated gene flow and introgression impacted the large-scale genetic structure? To answer these questions, we did a genetic inventory on 1970 pedunculate oak trees from 197 locations in 13 European countries. All samples were screened with a targeted sequencing approach on a set of 381 polymorphic loci (356 nuclear SNPs, 3 nuclear InDels, 17 chloroplast SNPs, and 5 mitochondrial SNPs). In a former analysis with additional 1763 putative Quercus petraea trees screened for the same gene markers we obtained estimates on the species admixture of all pedunculate oak trees. We identified 13 plastidial haplotypes, which showed a strong spatial pattern with a highly significant autocorrelation up to a range of 1250 km. Significant spatial genetic structure up to 1250 km was also observed at the nuclear loci. However, the differentiation at the nuclear gene markers was much lower compared to the organelle gene markers. The matrix of genetic distances among locations was partially correlated between nuclear and organelle genomes. Bayesian clustering analysis revealed the best fit to the data for a sub-division into two gene pools. One gene pool is dominating the west and the other is the most abundant in the east. The western gene pool was significantly influenced by introgression from Quercus petraea in the past. In Germany, we identified a contact zone of pedunculate oaks with different introgression intensity, likely resulting from different historical levels of introgression in glacial refugia or during postglacial recolonization. The main directions of postglacial recolonization were south to north and south to northwest in West and Central Europe, and for the eastern haplotypes also east to west in Central Europe. By contrast, the pollen mediated gene flow and introgression from Q. petraea modified the large-scale structure at the nuclear gene markers with significant west–east direction.


Introduction
The genetic differentiation of the maternally inherited chloroplast genome of European white oak species has been intensively studied for more than 20 years [1][2][3]. The screening of chloroplast haplotype variation of sessile (Quercus petraea) and pedunculate oaks (Quercus robur) at the continental level has revealed a clear large-scale structure, which most probably resulted from postglacial recolonization [4]. During the last glacial maximum, pedunculate oaks survived in refugia in southern and south-eastern parts of Europe. The oaks were fixed to different haplotypes in these refugia, and the directions of recolonization can be reconstructed using the large-scale genetic structure of the haplotypes [4]. More precisely, one lineage from the Iberian Peninsula has colonized many parts of western Europe, a lineage from Italy has migrated north to many parts of Central Europe and other haplotype lineages have migrated from the Balkan-region to the north in Central and East Europe [4]. The haplotype migration history of pedunculate oak in the eastern part of the natural distribution range in Russia and neighboring countries is less studied. Recently two studies have shown that the area in Belarus, Ukraine and western Russia have been colonized by haplotypes from the Balkan region, whereas the eastern and south-eastern part up to the Ural Mountains is showing other haplotypes with an origin in the southern Ural and southern Russia [2,5]. In summary, the pattern of oak haplotypes in Europe resulted from south to north, south to north-west and also south to north-east seed migration routes. For several other temperate tree species as such as Fagus sylvatica, Fraxinus excelsior, Picea abies, Abies alba, and Pinus sylvestris similar recolonization routes as for white oaks have been found in Europe [6].
Other factors that have influenced the present haplotype pattern are the hybridization among different white oak species [7] and to a smaller extent a manmade gene flow by seed transfer [8,9].
In contrast to the abundant literature relying on the chloroplast genome, there are no large-scale surveys on the genetic differentiation of pedunculate oak for the nuclear genome. Most studies are focused on comparison between a few regional stands using nuclear microsatellites, such as comparisons between different stands in Russia [10], Ireland [7], Finland [11], Germany [12,13], Bulgaria, and Greece [9]. The few works with SNP-arrays for pedunculate oak have used either only a few individuals or individuals from a specific region or from a single country, and are addressing the question of species hybridizations or adaptation following candidate gene approaches [13][14][15] For other temperate tree species there are also only few comparative studies dealing with the large-scale genetic structure at gene markers of the chloroplast and nuclear genome. For example, Bai et al. [16] observed a weaker but still clear genetic structure using nuclear markers in contrast to a prominent phylogeographic structure with chloroplast markers in a Juglans species and Thomson et al. [17] studied nuclear microsatellites and chloroplast markers for several birch species in North America characterizing postglacial recolonization and level of introgression.
The large-scale geographic pattern of the nuclear genome is probably influenced by several population genetic processes in different ways: genetic drift and migration barriers are expected to create local departures from a large-scale pattern. The effects of hybridization and introgression on the gene pool of the European white oaks are well known [14,18] but have never been studied for their impact on the large-scale genetic pattern. Finally, selection should only shape a large-scale genetic pattern in the presence of local adaptation (genotype environment association). However, the last should only involve a part of the genome. We therefore expect stronger effects by gene flow. There are several studies showing the impact of limited pollen and seed dispersal of pedunculate oak on small scale genetic structure [19,20], but the effectiveness of long-distance gene flow, particularly by pollen, has been poorly studied so far [21,22].
The objective of this paper was to analyze the genetic differentiation and spatial genetic structure of pedunculate oak for a large number of trees from many locations of the natural distribution range, using both an array of nuclear SNPs and plastid SNPs [23]. In a first manuscript, we focused on the large-scale structure of Quercus robur in Russia and neighboring countries [5]. In the present paper, we merged this data with those from 100 additional locations in Central and West Europe. The pedunculate oak trees in this paper were also part of a study on species admixture using genotype data of 4848 Q. robur and Q. petraea trees (unpublished data). By combining genotype data of the nuclear and organelle genomes with data on the level of individual admixture we aim to identify the impact of gene flow by seeds and pollen as well as the influence of introgression on the range-wide genetic structure of Q. robur. We expected, in contrast to the known strong spatial differentiation at the plastidial genome, only limited genetic differentiation at the nuclear SNPs, due to extensive longdistance gene flow by pollen leading to gene exchange among individuals of different plastidial lineages. The impact of introgression on the large-scale genetic structure of the nuclear genome is linked to several questions: Was introgression from other species limited to a particular geographic location and historical time period, such as a single glacial refugium? In this case, we would expect additional genetic differentiation and a visible contact zone of pedunculate oaks with different introgression history. Or did introgression occur several times and independently at different places? In that case, we would not expect any additional genetic differentiation of the pedunculate oak populations.

Sampling
We collected samples from 1970 pedunculate oak trees in 197 locations (10 individuals at each location) in 13 European countries ( Figure 1, Supplementary S1). The majority of the samples (188 locations) were collected in forest stands, whereas the others (nine locations) were taken from provenance trials. To collect mature and unrelated individuals, we defined a minimum diameter at breast height of 20 cm and a minimum distance of 50 m between trees with exception of location 503 where only saplings could be found. The majority of the samples came from Germany and Russia due to the focus of the projects on those regions. The pedunculate oak samples of this study were part of a broader collection of 4848 pedunculate and sessile oaks in Europe. The species admixture of all 4848 samples was checked based on the genetic data with a principal component analysis and Bayesian clustering using the software STRUCTURE (Supplementary S2). All 1970 samples of this study had an admixture proportion of the Q. robur gene pool of at least 80%. impact of gene flow by seeds and pollen as well as the influence of introgression on the range-wide genetic structure of Q. robur. We expected, in contrast to the known strong spatial differentiation at the plastidial genome, only limited genetic differentiation at the nuclear SNPs, due to extensive longdistance gene flow by pollen leading to gene exchange among individuals of different plastidial lineages. The impact of introgression on the large-scale genetic structure of the nuclear genome is linked to several questions: Was introgression from other species limited to a particular geographic location and historical time period, such as a single glacial refugium? In this case, we would expect additional genetic differentiation and a visible contact zone of pedunculate oaks with different introgression history. Or did introgression occur several times and independently at different places? In that case, we would not expect any additional genetic differentiation of the pedunculate oak populations.

Sampling
We collected samples from 1970 pedunculate oak trees in 197 locations (10 individuals at each location) in 13 European countries ( Figure 1, Supplementary S1). The majority of the samples (188 locations) were collected in forest stands, whereas the others (nine locations) were taken from provenance trials. To collect mature and unrelated individuals, we defined a minimum diameter at breast height of 20 cm and a minimum distance of 50 m between trees with exception of location 503 where only saplings could be found. The majority of the samples came from Germany and Russia due to the focus of the projects on those regions. The pedunculate oak samples of this study were part of a broader collection of 4848 pedunculate and sessile oaks in Europe. The species admixture of all 4848 samples was checked based on the genetic data with a principal component analysis and Bayesian clustering using the software STRUCTURE (Supplementary S2). All 1970 samples of this study had an admixture proportion of the Q. robur gene pool of at least 80%.   Table S1 (Supplementary), natural ranges of Quercus robur and Quercus petraea according to EUFORGEN.

DNA Extraction and Genotyping
The DNA was extracted following a slightly modified protocol of Dumolin et al. [24]. For all samples, 359 polymorphic nuclear loci (356 SNPs, 3 InDels), 17 polymorphic chloroplast SNPs (cp-SNPs), and five mitochondrial SNPs (mt-SNPs) were analyzed using targeted genotyping by sequencing. The gene markers were initially developed using 95 Q. robur and Q. petraea individuals sampled over the species' ranges [23].
Sequencing was conducted by LGC Genomics GmbH with Illumina NextSeq, using single primer enrichment technology producing 75 bp single reads at 200x average coverage per sample. Reads were trimmed for adapter sequences and quality. Reads with average phred score of <30 over a window of 10 bases were trimmed. Reads shorter than 65 bp or reads containing Ns were discarded. Read alignments against self-assembled contigs were created with Bowtie2 v2.2.3 [25]. Variant calling was done with Freebayes v1.3.2 [26]. Nuclear variants were called with ploidy = 2 and organelle variants were called with ploidy = 1. Genotypes were filtered for a minimum coverage of 3 reads with vcftools [27]. All 1970 individuals had complete data for at least 90% of the loci. The proportion of missing data for each locus over all individuals was less than 5%.

Haplotypes
We tested if there was any recombination between the 17 chloroplast SNPs and the five mitochondrial SNPs. For that, we pooled all 1970 individuals to one single population and computed the p-values for linkage equilibrium using a probability test implemented in the program GenPop [28]. This test checks for association between haplotypes at two loci, also called a test of the composite linkage disequilibrium. There was linkage (p < 0.000001) between all 22 organelle loci. Thus, we considered the combination of alleles at the 22 plastid loci as multilocus haplotypes, called just "haplotypes" in the following text.
We used the DNA-sequences of the 17 chloroplast SNPs and the five mitochondrial SNPs to construct a haplotype network with the minimum spanning method [29] as implemented in the software POPART [30]. The haplotype network was aimed to give us insights into the phylogenetic relationship of the different haplotypes.

Genetic Differentiation
Genetic distance (D 0 ) was calculated using the allele frequencies at the 359 nuclear gene loci and the frequencies of haplotypes to quantify the genetic differentiation between pairs of locations [31]: where i and j represent two sampled locations, n is the number of alleles or haplotypes, p ik is the relative frequency of the k-th allele or haplotype. Further, we calculated delta [32] and Wright's F ST [33], as measures of genetic differentiation and fixation among populations. The genetic differentiation delta was computed as where j is the sampled location, p i is the frequency of allele i at locus k or frequency of haplotype i and p i is the frequency at allele i or haplotype i of all other locations (complement). By shifting individual genotypes among the sampled locations, we estimated the statistical significance of delta, F ST and D 0 with 1000 permutations. The calculations were done with the computer program GDA_NT 2021 [34].

Bayesian Clustering Analysis
We used the Bayesian clustering method implemented in the software STRUCTURE v.2.3.4, Stanford, California, USA [35] to check the number of genetic groups present in the 197 sampled locations. The model assumption was that the involved loci are not linked and that they are in Hardy-Weinberg Equilibrium. We included all 359 nuclear gene markers in this analysis. We set the length of burn-in and Markov chain Monte Carlo simulations to 15,000 and 100,000, respectively, and tested K values from 1 to 7 each 20 times. We ran the admixture model with correlated allele frequencies. Then, we computed delta K values to find the optimal number of genetic clusters in the data [36]. For each tested K value, data from the 20 STRUCTURE runs were summarized and graphically presented to obtain the final Q values for each individual with the programs CLUMPP v.1.1.2 [37] and CLUMPAK [38].

Spatial Structure and Maps
We studied the spatial structure of allele frequencies at the nuclear gene markers and the frequencies of multilocus haplotypes using the software SGS [39,40]. The genetic distance (D o ) was used for the genotypes and haplotypes to test spatial patterns in different geographic distance classes. By shifting the geographic co-ordinates of the sampled locations, the statistical significance was tested by 1000 permutations. The maps were drawn with ArcGIS 10.8 (Esri).

Haplotypes
We observed 13 different haplotypes in all 1970 individuals based on the 17 cp-SNPs and the five mt-SNPs with a clear spatial pattern (Figure 2a). The haplotype network (Figure 2b) showed different lineages: an eastern lineage with the green and yellow marked haplotypes (H06, H07; H08, H09, H10), the lineage including the blue and pink colored haplotypes H01, H03 and H13, the lineage with the black (H05) and grey (H04) haplotype, and the lineage with the red haplotypes (H11, H12). The yellow haplotype H07 is dominating the east of Russia and the green haplotype H06 was most frequent in the central and western parts of the species distribution range in Russia including a part of Ukraine. In the middle belt including the Balkan and Baltic countries as well as the north-west of Russia the haplotypes H01 and H03 were more common. In eastern Poland and southern and south-east Germany, haplotype H13 was the most frequent one (Figure 7a). In West and North Germany and in part of France, haplotype H5 was frequent. In West and North Germany large areas were dominated by the genetically quite different haplotype H11 (Figures 2a and 7a).
In a range up to 1250 km the genetic distance among the frequencies of haplotypes at the locations were significantly smaller than expected for a random spatial distribution ( Figure 3a).
We calculated the frequencies of haplotypes at each location. Based on these haplotype frequencies, we computed the genetic distances among locations, delta and F ST . The genetic distances D 0 had an arithmetic mean of 0.83 and varied between 0.00 and 1.00. The population fixation F ST for the haplotypes was 0.83 and delta 0.78. Both values were highly significant (p < 0.001).

Nuclear Markers
We computed the allele frequencies at the 359 nuclear gene loci at each of the 197 locations. Based on these allele frequencies we calculated the genetic distances D 0 among locations, delta and F ST . The permutation tests of the gene pool distances among all 19,306 pairs of locations were statistically significant (p < 0.05) in 87% of all cases. The gene pool distances had an arithmetic mean of 0.094 and varied between 0.062 and 0.159.

Nuclear Markers
We computed the allele frequencies at the 359 nuclear gene loci at each of the 197 locations. Based on these allele frequencies we calculated the genetic distances D0 among locations, delta and FST. The permutation tests of the gene pool distances among all 19,306 pairs of locations were statistically significant (p < 0.05) in 87 % of all cases. The gene pool distances had an arithmetic mean of 0.094 and varied between 0.062 and 0.159.
Up to a range of 1250 km, the gene pool distances among locations were smaller than expected for a random pattern (Figure 3b). This illustrates the range of a positive spatial autocorrelation. The population fixation over all loci FST was 0.077 and delta 0.069. Both values were highly significant (p < 0.001). Up to a range of 1250 km, the gene pool distances among locations were smaller than expected for a random pattern (Figure 3b). This illustrates the range of a positive spatial autocorrelation. The population fixation over all loci F ST was 0.077 and delta 0.069. Both values were highly significant (p < 0.001).
The results of the STRUCTURE analysis showed that based on delta K [36], the best representation of the data is achieved if two groups are considered and there was a second much smaller peak for K = 5 ( Figure 4). With K = 2, we observed a clear geographic pattern with one gene pool dominating the west (blue color in Figure 5a) and another gene pool dominating the east (green color). There was an area including parts of Germany, Poland, the Balkan and Baltic states, Belarus, and Ukraine with locations that had a mixture of both gene pools. For K = 3 ( Figure 5b) there was still a gene pool frequent in the west (blue color) and in the east (green color) but there appeared a third gene pool (red color) most frequent in the area between those two groups. With four gene pools (K = 4, Figure 6a), the eastern gene pool was further subdivided into two gene pools (green + yellow color). Finally, with five gene pools (K = 5, Figures 6b and 7b) an additional gene pool (black color) was visible in the central area (East Germany, Balkan, Poland, Baltic States, Belarus, Ukraine). Within Germany we observed two zones: the north-west with higher frequency of K1 and the south and north-east with K3 as the most abundant gene pool (Figure 7b). The results of the STRUCTURE analysis showed that based on delta K [36], the best representation of the data is achieved if two groups are considered and there was a second much smaller peak for K = 5 ( Figure 4). With K = 2, we observed a clear geographic pattern with one gene pool dominating the west (blue color in Figure 5a) and another gene pool dominating the east (green color). There was an area including parts of Germany, Poland, the Balkan and Baltic states, Belarus, and Ukraine with locations that had a mixture of both gene pools. For K = 3 (Figure 5b) there was still a gene pool frequent in the west (blue color) and in the east (green color) but there appeared a third gene pool (red color) most frequent in the area between those two groups. With four gene pools (K = 4, Figure 6a), the eastern gene pool was further subdivided into two gene pools (green + yellow color).  Finally, with five gene pools (K = 5, Figures 6b and 7b) an additional gene pool (black color) was visible in the central area (East Germany, Balkan, Poland, Baltic States, Belarus, Ukraine). Within Germany we observed two zones: the north-west with higher frequency of K1 and the south and north-east with K3 as the most abundant gene pool (Figure 7b).

Figure 4. Delta K values for the results of the analysis with STRUCTURE.
We computed the Pearson correlation coefficient between the different genetic groups for K2 to K5 and the admixture proportion of Quercus petraea (Supplementary S2) for all 197 locations and observed a significant positive correlation (p < 0.001) with the first genetic group indicated as blue color in Figures 5a,b, 6a,b, and 7b: r = 0.51 for K2_1, r = 0.72 for K3_1, r = 0.77 for K4_1, and r = 0.81 for K5_1.
(a) color) was visible in the central area (East Germany, Balkan, Poland, Baltic States, Belarus, Ukraine). Within Germany we observed two zones: the north-west with higher frequency of K1 and the south and north-east with K3 as the most abundant gene pool (Figure 7b).

Genetic Correlation among Nuclear and Organelle Genome
We observed a significant positive Pearson's correlation coefficient of r = 0.37 (p < 0.0001) among the pairs of gene pool distances at the 359 nuclear gene markers and the genetic distances of the haplotypes. In Germany, we found a similar pattern of genetic differentiation for both types of genomes with a nearly identical contact zone (Figure 7a,b). However, due to the high level of haplotype diversity, the correlation coefficient among the matrix of genetic distances of nuclear gene pools and haplotypes including only locations in Germany had a correlation of only r = 0.15 (p < 0.001).

Impact of Seed and Pollen Mediated Gene Flow
For both the nuclear and organelle SNPs a positive spatial autocorrelation up to 1250 km was observed. The strength of genetic differentiation was much larger for the haplotypes. The distribution pattern of haplotypes is best explained by recolonization after the last glacial period from refugia in the Iberian Peninsula, Italy, the Balkan region, southern Russia, and the southern Ural region [1,2,4]. The locations of refugia and the postglacial recolonization history of white oaks in Europe has also been confirmed by fossil pollen profiles [41][42][43]. In Central Europe, different haplotype lineages came together and increased the mixture and diversity [8]. This was also the case for several other tree species [44].
Thus, main directions of this gene flow by seeds were south to north-east in West and Central Europe and south-north and east-west in East Europe. Besides this large-scale pattern, also the influence of manmade seed transfer is visible in Central Europe on a local scale [8,9]. Therefore, it is known that pedunculate oaks have been introduced to western Germany on purpose from Slavonia and this seed transfer is still visible by different haplotypes compared to the oak stands in the neighborhood [45]. However, so far, the impact of this long-distance manmade seed transfer on the regional and large scale spatial genetic structure of pedunculate oak is limited (see two examples marked with arrows in the We computed the Pearson correlation coefficient between the different genetic groups for K2 to K5 and the admixture proportion of Quercus petraea (Supplementary S2) for all 197 locations and observed a significant positive correlation (p < 0.001) with the first genetic group indicated as blue color in Figures 5a,b, 6a,b and 7b: r = 0.51 for K2_1, r = 0.72 for K3_1, r = 0.77 for K4_1, and r = 0.81 for K5_1.

Genetic Correlation among Nuclear and Organelle Genome
We observed a significant positive Pearson's correlation coefficient of r = 0.37 (p < 0.0001) among the pairs of gene pool distances at the 359 nuclear gene markers and the genetic distances of the haplotypes. In Germany, we found a similar pattern of genetic differentiation for both types of genomes with a nearly identical contact zone (Figure 7a,b). However, due to the high level of haplotype diversity, the correlation coefficient among the matrix of genetic distances of nuclear gene pools and haplotypes including only locations in Germany had a correlation of only r = 0.15 (p < 0.001).

Impact of Seed and Pollen Mediated Gene Flow
For both the nuclear and organelle SNPs a positive spatial autocorrelation up to 1250 km was observed. The strength of genetic differentiation was much larger for the haplotypes. The distribution pattern of haplotypes is best explained by recolonization after the last glacial period from refugia in the Iberian Peninsula, Italy, the Balkan region, southern Russia, and the southern Ural region [1,2,4]. The locations of refugia and the postglacial recolonization history of white oaks in Europe has also been confirmed by fossil pollen profiles [41][42][43]. In Central Europe, different haplotype lineages came together and increased the mixture and diversity [8]. This was also the case for several other tree species [44].
Thus, main directions of this gene flow by seeds were south to north-east in West and Central Europe and south-north and east-west in East Europe. Besides this large-scale pattern, also the influence of manmade seed transfer is visible in Central Europe on a local scale [8,9]. Therefore, it is known that pedunculate oaks have been introduced to western Germany on purpose from Slavonia and this seed transfer is still visible by different haplotypes compared to the oak stands in the neighborhood [45]. However, so far, the impact of this long-distance manmade seed transfer on the regional and large scale spatial genetic structure of pedunculate oak is limited (see two examples marked with arrows in the maps of Figure 7a,b). The limited impact might be explained by the difficulties for humans to store oak acorns for longer time and the weight of acorns. Both were factors influencing transport distances in the past, especially in the time before railway carrying was an option. In contrast, for conifers with small seeds that can be stored for several years the impact of manmade seed transfer all over Europe is much more pronounced [46].
For oaks with their heavy acorns, natural seed dispersal by wind and to a smaller extent by birds, especially the Eurasian jay, is limited. Wind dispersal of seeds is below 50 m in most cases [47] and the jay can transport seeds up to a few hundred meters and in rare cases a few kilometers [48]. Apparently, the capacities of seed dispersal were not strong enough during the time since recolonization to create a stronger mixture of haplotypes. We observed a mixture of haplotypes where different recolonization pathways came together but no further mixture occurred in the way that eastern haplotypes were present far in the west and vice versa.
In contrast to the gene flow by seeds, we expected a gene flow by pollen over much larger distances due to wind-dispersal. Most experimental studies on pollen-mediated gene flow were done on rather local scales. The effective pollen dispersal was often less than 100 m and in rare cases up to a few hundred meters [49,50]. There are, however, contrasting findings on the importance of long-distance gene flow by pollen of pedunculate oaks. Buschbom, Yanbaev and Degen [21] found, in a small isolated population of Q. robur in the southern Urals, that about 30% of the effective pollen came from a distance of at least 80 km. Additionally, Gerber, et al. [51] demonstrated relevant proportions of successful pollen in oaks coming from outside the studied stands. Moracho et al. [52], on the other hand, found little evidence for gene flow from outside the local stand in fragmented pedunculate oak populations in Spain. Several factors influence the impact of external long-distance pollen [22,53]: the synchronization of flowering, abundance of local pollen compared to external pollen, viability of pollen in relation to UV radiation, wind direction and strength. Some of these factors have been considered in a modeling approach on dispersal of oak pollen by Schueler and Schlunzen [54]. They found that physical pollen dispersal up to 100 km is possible, but to which extent this pollen could be part of fertilization was still unclear.
For the nuclear genome of the oaks which is dispersed by pollen and seeds we would have expected either no or only a relatively weak spatial autocorrelation on larger scales [55,56]. Nevertheless, we found spatial autocorrelation on the same scale up to 1250 km as for the maternal haplotypes. The genetic differentiation at the nuclear gene loci was much weaker compared to the organelle gene loci which can be interpreted as an effect of the stronger gene flow by pollen. The weak positive correlation of 0.37 among the matrix of gene pool distances at nuclear and organelle gene markers underlines that at least partly different forces or different geographic directions of gene flow are responsible for the observed genetic differentiation of the two parts of the genome.
The STRUCTURE analysis with an aggregation of the data at the nuclear gene loci to different gene pools did show a higher mixture of gene pools compared to the distribution pattern of haplotypes. In contrast to the main direction of recolonization (south-north) there was evidence of an intensive pollen-mediated gene flow from east to west and west to east. Nevertheless, there was still a clear isolation by distance effect [57] that produced the significant spatial autocorrelation at the nuclear gene loci and that enables the detection of different gene pools with STRUCTURE. Additionally, the relatively sharp contact zone of different nuclear gene pools in Germany (Figure 7b) raises questions on the effectiveness of long-distance pollen-mediated gene flow.

Hybridization and Introgression
Hybridization among Q. robur and Q. petraea occurs with low frequency in mixed oak stands [18,58]. Hybridization leads to introgression if the interspecific gene flow confers an adaptive advantage or is selectively neutral at least [14,59]. For the maternal haplotypes, it is well established that the same haplotypes can be found in the same regions in different white oak species, especially Q. robur and Q. petraea, and this is explained by a certain level of hybridization among related oak species [60], which could even be asymmetric [61]. The impacts of hybridization and introgression on shared haplotypes have also been reported for other deciduous tree species such as Betula in North America [62].
The natural distribution range of pedunculate and sessile oak is different: Quercus robur has a distribution a few thousand km more to the east up to the Ural Mountains ( Figure 1). Hence, if hybridization among these two oak species is shaping the gene pool, it should not be the same in all regions but mainly plays a role in West and Central Europe.
Using genetic data of 4848 sessile and pedunculate oaks from the whole species distribution ranges, we estimated the admixture level of all trees involved in this study (Supplementary S2). We only included in the present paper those Q. robur trees with an admixture of Q. petraea of less than 20%. The analysis with STRUCTURE showed that parts of the western gene pool of the pedunculate oaks were correlated with the admixture level of Q. petraea (Supplementary S2). Especially in Germany, we found a sharp border between pedunculate oaks in the north-west influenced by this introgression and pedunculate oaks in the south-west and east composed by other gene pools with no sign of introgression. This sharp border was also found in respect to the composition of haplotypes. We conclude from the similarity of these contact zone (Figure 7a,b) that the present day spatial genetic structure of nuclear genetic information still has footprints of postglacial recolonization. The introgression happened either in the refugia or during the recolonization phase [63]. West and north-west Germany have been colonized from the Iberian-lineage that has still signs of introgression, whereas the south-east of Germany has been colonized by oaks from another refugia lineage (refugia from Italy and the Balkan region) that did not experience the same level of introgression from Q. petraea. Gene flow by pollen was not strong enough to equalize these genetic differences. Thus, we conclude that the introgression occurred in the past in the west and migrated in direction north-east.
For other broadleaved tree species, e.g., Betula alleghaniensis and Betula papyrifera, in North America also large-scale patterns of different levels of introgression have been observed [17]. This study on the Betula species showed how ongoing hybridization could impact the distribution of haplotypes and obliterate the large-scale genetic structure resulting from recolonization from glacial refugia. In contrast, our results showed that historical introgression could even strengthen the large-scale genetic pattern caused by recolonization from glacial refugia.
One factor influencing the level of hybridization is simply the level of mixture and abundance of both oak species [58]. In this aspect, we would had expected higher ratios of hybridization in the south-west of Germany. However, this region is part of the area with a low level of introgression further emphasizing the importance of historic rather than current hybridization events shaping large-scale genetic patterns.

Genetic Selection
Can genetic selection explain the large-scale pattern of the nuclear gene loci? First, it should be noted that we did not follow a candidate gene approach for the gene marker development as in other studies [15,55]. Moreover, our selection of candidate SNPs in the nuclear genome was determined by the occurrence of restriction sites in the ddRAD approach [23]. In a blast search of all probe sequences including our SNPs against the reference genome of Q. robur, we estimated a uniform genome-wide genetic marker distribution [64]. In principle, many of the identified markers might also be adaptive as we found about half of all SNPs to be located within gene annotations. It is unlikely, however, that many of the studied SNPs are under selection for traits that are linked to large-scale environmental gradients. Future studies with whole genome inventories may clarify the impact of adaptation on the genetic structure.

Conclusions
Pedunculate oak exhibits marked genetic patterns on the European continental scale for both the nuclear and the organelle genomes. The pattern of the organelle DNA is still reflecting the impact of post-glacial recolonization from different refugia in the south. Central Europe is a region with a high level of haplotype diversity because of the merging of different recolonization lineages. Recent gene flow by seeds did not erase this historical recolonization pattern of haplotypes and, so far, also manmade seed transfer had only local influence on the spatial genetic pattern. The intensity of historical introgression of Q. petraea into the gene pool of Q. robur was not the same for all recolonization lineages and thus a part of the large-scale genetic structure for the nuclear genome can be explained by differences of historical introgression. This added an intraspecific gene flow from west to east. In addition, the pollen-mediated gene flow modified the large-scale structure at the nuclear gene markers with distinct east-west and west-east gradients leading to increased admixture of the gene pools.